Background and Introduction
- David - Motivation for GOCHE
- Chris - how the original GOCHE file was created
- David - steps leading up to today
- GOALS: GOCHE = CHEBI, improve GO using CHEBI, improve CHEBI using GO
- Chris - process of how improvements to GO using CHEBI will be done
- Discuss clarifying the plural terms in CHEBI, we suspect that the plurals represent what we are calling "X and derivative".
- We need a good definition for derivative to use on the '...and derivative' terms Can you help?
- Derivative: not a good word for dealing with plurals. Means something like common substructure, fiddly things like natural product-y things. Plurals mean: family of structurally related compounds. Skeleton is core to the sense in which chemists group things into families.
- X and derivative => X-containing compound rather than Xs (plural) since the X plural terms are too specific -- they pick out a family where the “derivative” is too small.
- GO may well use ‘X containing compound’ everywhere they currently use ‘X and derivative’.
- Colin would prefer ‘X containing molecule’ because compound can imply substance. In chemistry. Maybe ‘X containing molecular entity’ (GO won’t use this though). Nico points out that we should use the name for the part rather than the name for the freestanding thing.
- Increase density of has-part defined classes in ChEBI.
- Does modified amino acid mean essentially 'amino acid derivative', in which case shouldn't it have an is functional parent relationship with amino acid?
- Chemists think in terms of chemical skeletons e.g. anything with a phenol aromatic ring is a phenols.
- Derivatives are entities with common substructures, not necessarily actually derived from one another
- The substructure also has to be the main component of the derivative.
- Derivative isn't a very precisely defined term in chemistry
Action for GO: Change goche id for benzene and derivative for the chebi id for benzenes (uncurated)
- Can we classify chemicals on the basis of the presence of a substructure, regardless of what the rest of the molecule is?
- Could CHEBI to rename/define benzenoid aromatic compound to benzene-containing compound, then GOCHE term benzene and derivative can be mapped to this term rather than benzenes? - yes
- More generally, GOCHE 'x and derivative' terms should be mapped to the 'x-containing compound' parent term rather than the plural
Action for GO: GO to rename the GOCHE 'x and derivative' terms to be 'x-containing compound' throughout
Action for GO: provide CHEBI with a list of the 'x-containing compound' terms for them to add
Found these remaining in GOCHE on 7/25 CHEBI:26373 ! pteridine and derivative CHEBI:26401 ! purine and derivatives CHEBI:33709 ! amino acid and derivative CHEBI:39447 ! pyrimidine and derivatives GOCHE:0090038 ! ethanolamine and derivative GOCHE:0090044 ! indole and derivative GOCHE:0090047 ! thiamine and derivative GOCHE:0090050 ! folic acid and derivative GOCHE:0090055 ! riboflavin and derivative GOCHE:0090055 ! riboflavin and derivative benzene and derivative < removed since last? can't find phenol and derivatives < removed since last? can't find
Conclusion: GOCHE 'x and derivative' terms should be mapped to the 'x-containing compound' (CHEBI will call these 'x-containing molecular entity) parent term rather than the plural. CHEBI will add/rename to create these terms where necessary. These terms will have a has_part relationship to the containing compound.
- Discuss the inclusion of small molecules into Chebi. Could we have macromolecule vs. monomer? How can we make the distinction of what biologists call a 'small molecule'. What is the complement of macromolecule?
- Meaning of macromolecule in CHEBI to meaning in GOCHE
- problem is that there is no definition for small molecule. what is not a macromolecule? What is a polymer? What is a small molecule? 150000 Dalton biological molecule, no repeating units, would be a small molecule.
- ChEBI scope --> wider than just ‘small’ molecules. Too difficult to define small molecule in ChEBI because of sorites like paradoxes. Substances and molecules and macromolecules need to be sorted out in ChEBI.
Conclusion: keep small molecule only in GO - remove from GOCHE. Ensure all children have an alternative structural path. (DONE)
- Protein and its children - some of these in CHEBI. Are they due to be removed? In PRO?
- Same for RNAs
- CHEBI have discussed proteins with PRO - all protein subtypes will move to PRO.
- Owner of glycoproteins etc yet to be determined. CHEBI might have a term 'protein-chemical complex
- peptide hormones are just peptides and will live in CHEBI, what is a peptide vs. protein? It is ok if the precursors were encoded by the genome.
- Action for GO: check through the remaining hormone terms to retain only the peptide hormones, not proteins (did this, we have a couple that are 191 or 198 aa, are these peptides?)
Nucleotides, nucleosides etc.
- We merged nucleotide and nucleoside phosphate. These seem to be exact synonyms of one another.
- There are non-naturally occurring nucleoside phosphates that are not nucleotides.
- nucleoside phosphates don't have the position of the phosphate specified
- Action for GO: untangle the merged nucleotide and nucleoside phosphate (DONE)
- Action for GO: when CHEBI has fixed arrangement of nucleoside phosphate/nucleotide, need to fix GOCHE accordingly (can we write formal definitions for this problem and use this example for the paper?)
- Purine is not currently is_a purines - is this deliberate or an oversight?
- An oversight - CHEBI will fix
- Hypoxantine - should be a purine base in CHEBI?
Action for GO: Move hypoxanthine up to be a child of 'purine-containing compound' (when we have that term)
- Should we merge pyrimidine with pyrimidine (nucleo)base?
- There is a difference between pyrimidine with pyrimidine (nucleo)base - GOCHE should reflect this
- NADH is_a NAD and NADPH is_a NADP. Seems like isa isn't quite right here?
- Action for GO: GOCHE needs to change NAD to NAD+
- Pyridine nucleotide - this looks by its children like it should be a ribonucleotide - should it have this parent/name?
- Carbohydrates and nucleotides - should nucleotide have functional parent rather than is_a carbohydrate (N-glycosyl has functional parent carbohydrate)?
- CHEBI will reorganize these terms using has_part e.g. nucleoside phosphate has_part nucleoside
- Created axioms for nucleobase and nucleoside terms. Tagged with GOC:carnegie dbxref. Will pass on to CHEBI and they will do their magic.
- Janna's notes: nucleoside monophosphate is a nucleotide in GO, but in ChEBI not. Looking at the definition of nucleoside monophosphate in ChEBI there is a structure which indicates that it is the nucleotide rather than the nucleoside (broader term). Conclusion: Nico to fiddle with the hierarchy, names and definitions in ChEBI so that nucleoside mono-, di-, and triphosphates are is-a nucleotide. Broadly speaking we can imagine nucleotide monophosphate that is not a nucleotide but this isn’t the sense of either the ChEBI or the GO hierarchies. Need to move out children which don’t conform to the natural positioning, because currently there ARE children which don’t specify the position. This is not specified in the name: ‘nucleoside monophosphate’ does not specify where the phosphate is attached. So we need a class for the nucleoside monophosphates where they are attached in the expected biological position and a class for the nucleoside monophosphates that are not in the expected biological position.
- Action for CHEBI: purine is a purines is missing in ChEBI. Should be there. To be added.
- Pyrimidine/pyrimidine base? These are distinct entities. GO to sort out.
- NADP / NADPH / ? Go to make NAD refer to NAD+, NADP refer to NADP+, Definitions can be improved. To be human curated. NAD(P) is a nucleotide diadenine nucleotide. Term used in enzyme nomenclature. ChEBI should add ‘pyridine ribonucleotide’ under ‘ribonucleotide’.
- Nucleotide and carbohydrate. Nucleotide is a carbohydrate phosphate. Nucleoside phosphate has functional parent nucleoside. Better would be to describe everything in terms of has part relationships. Nucleoside phosphate is a phosphoric ester that has part nucleoside and has part carbohydrate. Nucleoside has part pentose. A lot of has part relationships to be asserted!
- Amino acid families - not in CHEBI?
- We have removed from GOCHE
- We need a conjugate base term for 'amino acid'. There may be others. Does CHEBI have a systematic way of checking the conjugate base term is always added?
- Conjugate base/acid relationships to be asserted higher up in the hierarchy such as between amino acid anion and amino acid. These can be straightforwardly added in ChEBI. To be submitted. Carboxylic acid metabolism in GO = carboxylic acid anion metabolism to biologists. Process: proton transfer. Maybe a parent class in ChEBI for each conj acid/base pair? What would we call it? No -- don’t want this as not normally referred to in chemical context.
- Relevant files:
- Action for GO: Generate a list of missing conjugate bases for submission to CHEBI. CHEBI will check if the entry already exists, and if it does make the appropriate is_conjugate_base relationship and if not, add it.
- mancude organic heterobicyclic parent CHEBI:35570 - should this term really have 'parent' on the end?
- 48 terms in CHEBI with 'parent' in name.
- Makes a statement about the role of the representation of the molecule in IUPAC (?)
- GOCHE should not directly link to these terms
- In CHEBI sn-glycerol-3-phosphate is_a glycerol-1-phosphate
- An oddity due to IUPACs numbering system. GOCHE can ignore. (sn = stereospecifically numbered)
- In CHEBI pectin is_a galacturonan. We think this should be petin has_part galacturonan - we've put this in GOCHE.
- pectin: a mixture of different polysaccharides. Should be is-a substance.
- CHEBI will fix this. This is one of the cases where the plural was changed to a singular, but in this case wasn't quite correct because pectin is a mixture of polysaccharides.
- Is an amide an amine?
- Two types of amide: only one is an amine so no.
- Is 'molybdopterin cofactor' a 'pteridine and derivatives' ?
- 'molybdopterin cofactor' would be an intersection term in CHEBI, of the role and the structure. Actually refers to two chemical entities (union) and an intersection with has-role cofactor. To be better defined in ChEBI.
- Action for GO: Relate this term to the new CHEBI term that refers to the intersection of the CHEBI role and the CHEBI structure
- sulfur metabolism = metabolism of S (element) OR sulfur containing compounds, therefore, do we change 'sulfur metabolism' to 'sulfur and sulfur compound metabolism' - sounds like GO 'sulfur' = CHEBI 'sulfur' = 'sulfur and derivative' and GO 'sulfur compound' = ChEBI 'organosulfur compound'. GO: sulfur = ‘sulfur molecular entity’ and we also want to rename all of these X molecular entity as ‘X-containing molecular entity’.
- CHEBI will change the name of their existing term sulfur molecular entity to term sulfur-containing molecular entity, and add has_part relationship to sulfur.
- Is glyoxylate an aldehyde or not?
- Yes, but GOCHE won't explicitly make it an aldehyde. It will be inferred in a CHEBI relationship.
- adrenocorticotropin, prolactin, somatostatin - do these belong in CHEBI or somewhere else?
- If CHEBI says they are peptides, they will stay under peptides, if not we will put them under proteins and will fall under PRO's scope.
- Would CHEBI consider adding small molecule as a counterpart to macromolecule (see GO def)?
- CHEBI will not add small molecule. GOCHE terms that were children of this term have other parents that trace to chemical already.
- What is the relationship between sphingoid and sphingolipid? Is it upside-down?
- Probably. Will investigate further and fix in CHEBI, if necessary.
- ‘Parent’ term: making a statement about organic nomenclature rules rather than making a statement about chemistry. IUPAC rules / chemical community.
- Whenever ChEBI makes a has-part relationship where meaning a part of a molecule, covalently bonded, then the range of the has-part is the *group* ... not a freestanding molecule. So really in ChEBI, ‘benzene containing molecular entity’ will be ‘phenyl containing molecular entity’ and will have part phenyl group.
We will make requests on the appropriate SF trackers. We should create a group in ChEBI's tracker for GO requests and a group in GO's tracker for ChEBI. File: go_nojustifiedbyCHEBI (or something like that)
- Action for GO: sort out the lipoproteins from the lipoprotein particles in GO
- may be covalently bound (lipoprotein particle) or non-covalently-bound lipoprotein (complex).
- Action for CHEBI: add a new lipoprotein term for proteins with lipid covalently attached
- Action for GO: sort out what GO means by phosphate and request appropriate terms from CHEBI
- Action for GO: rename phenol M,B,C to be phenol-containing compound M, B, C
- Action for GO: figure out which cobalamin we are talking about? There are several.
- Action for CHEBI: add retinoic acid, folate, thiamin (others also) as a vitamin
- Action for CHEBI:ChEBI to make a link from hydrogenphosphate (phosphate2- ion) to organic anion (Can’t we logically define inorganic? Exception list for Carbon-containing molecules which are considered inorganic, plus closure axioms on molecular atomic composition)
- Action for CHEBI: magnesium ion need to be metal ion. And ditto for other metals.
- Action for CHEBI: carbohydrate phosphate and children should have_part carbohydrate and have_part phosphate
- Action for GO: rename toluene M,B,C to be toluene-containing compound M, B, C, revisit definition to be clear
- Action for GO: add some vitamins from CHEBI to GO
- Action for GO: move isoflavonoid xx to be a sibling of flavonoid xx
- Action for CHEBI:catechol / phenol / benzene. Can make catechol is a phenol containing compound (molecular entity).
Edit nucleotide version of CHEBI as Sanity check
Tanya has edited her local copy of CHEBI. The GO group has checked the paths using 'metabolism', 'biosynthesis', 'catabolism', 'transport' and 'response to' to make sure that this approach works.
nucleotide- works for metabolism, biosynthesis, catabolism, binding and response to.
Tanya has committed this file to CVS. (/go/scratch/obol_results/chebi_nucleotide.obo)
Colin gave a presentation on his and Janna's work on roles and dispositions. We had a discussion about how these relate to GO. They are interested in this with respect to small molecules. we will want to address this in the paper.
Is this a role or not? Yes.
- Action for GO: - remove 'prosthetic group' from GOCHE (DONE)
glycoprotein = is_a information biomolecule that has_part protein AND has_part carbohydrate
- ChEBI next steps -
- Action for CHEBI: Janna has kept a running list, she will email to Tanya and Tanya will integrate into wiki (DONE)
- GO next steps
- Action for GO: use the CHEBI submission toolto submit everything on Nico's yes list. They will be issued ids immediately. If they're already in CHEBI, they will merge the duplicate entries so the id won't be lost.
- Procedure for future interactions : email, SF trackers, CHEBI submission tool
- Need to prevent drift
- David will make outline of paper and send it around
- Create GOCHE-GO xps very soon, will transition to CHEBI