Chemical terms in GO: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 91: Line 91:


Add GOC:chem_mtg as a definition dbxref to all terms you look at, either by adding an 'x.' definition to the term or by filling in a missing definition dbxref.  You can then use the Term Renderer to see all the terms that have been reviewed and what is still left to do. Definition dbxref = GOC:chem_mtg, foreground color = color of your choice
Add GOC:chem_mtg as a definition dbxref to all terms you look at, either by adding an 'x.' definition to the term or by filling in a missing definition dbxref.  You can then use the Term Renderer to see all the terms that have been reviewed and what is still left to do. Definition dbxref = GOC:chem_mtg, foreground color = color of your choice


#Finish reviewing GOCHE, adding missing compounds
#Finish reviewing GOCHE, adding missing compounds
Line 110: Line 109:


==Notes and questions==
==Notes and questions==
*GOCHE doesn't seem to have a term for 'organic compound' (by that or any other name). ChEBI defines 'organic molecular entity' as 'A molecular entity that contains carbon', so we can say organic compound = carbon compound in GO (and rearrange a few existing links accordingly). (-midori 2010-03-08)
===Midori 2010-03-08===
GOCHE doesn't seem to have a term for 'organic compound' (by that or any other name). ChEBI defines 'organic molecular entity' as 'A molecular entity that contains carbon', so we can say organic compound = carbon compound in GO (and rearrange a few existing links accordingly). (-midori 2010-03-08)
 
===Midori 2010-03-09===
Added new root 'organic substance' GOCHE:0000001 (CHEBI:50860). Many, many terms must be moved under it in both GO and GOCHE, but I haven't done that yet in either ontology. I can start on both in parallel any time.
 
I encountered a case where different GO terms use different names for the same chemical! It's CHEBI:1427. Neither Jane nor I could recall that situation cropping up during the meeting, so we crafted a strategy on the spot:
*used the ChEBI name in GOCHE
*added exact synonyms in GOCHE for each variant occurring in GO
*used GO IDs as synonym dbxrefs
 


[[Category:Cross Products]]
[[Category:Cross Products]]
[[Category:Chemical Entities]]
[[Category:Chemical Entities]]

Revision as of 08:03, 9 March 2010

Goal: make terms that refer to chemicals internally consistent in GO prior to alignment with ChEBI

Meeting March 5-6, 2010

Chris generated ontology of chemicals named in GO terms, with Chebi IDs - GOCHE. We are using this ontology as the framework to generate the chemical representation that already exists in GO. This file needed review because GO contains chemicals that could not be mapped to Chebi and in some cases the mappings to Chebi were not correct due to different term usage in GO. For example, in some cases GO uses a singular chemical name to represent a class of chemicals that contain the same chemical skeleton and in some cases, GO uses the 'chemical and derivative' nomenclature.

Going through GOCHE; for each chemical 'X'

see if terms exist for

  • X metabolic process (M)
  • X biosynthetic process (B)
  • X catabolic process (C)
  • X transport (T)
  • X transporter (R)
  • X binding (I)

Adjust parentage in GOCHE based on GO paths; add GOCHE terms as needed for chemicals that were not found/mapped in CHEBI. When this happens if there was a CHebi ID for the term, we added a new GOCHE term and added the Chebi ID as a term dbxref. The set of GOCHE terms will identify those that we need to discuss with Chebi.

GOCHE represents the union of the M,C,B,T,R and I paths in GO for a given chemical. There are other paths in less comprehensive areas of the ontology like 'response to chemical' and children.

March 5th: got through lipids and CDP-diacylglycerol
resume at monoglyceride

March 6th: modified approach:

  • Skip noting which types of GO terms exist in GOCHE file. We can mine this later if we need this information. We already have enough mapped to illustrate the point that GO is internally inconsistent.
  • Mark all terms we've looked at with 'GOC:chem_mtg' definition dbxref.
  • Note all parents from GO paths as before.
  • Filled in more chemicals that GO has but ChEBI doesn't; we don't know what they are!

Concentrated on high-level terms by working from the top down, one level at a time, because that's where most problems crop up; easier to sort out more specific terms without meeting face-to-face. Ended Saturday session after getting down to second level of 'heterocycle compound'.

Notes:

  • at present, GO doesn't have paths from nucleoside/tide/base terms to 'aromatic compound' terms, but we will want to add back in GOCHE
  • The 'response to chemical substance' branch isn't consistent with the rest of GOCHE at all. We need to decide if we are going to retain this representation.
  • Note that we have made a few fixes, but have not consistently fixed all problems we've spotted.


GOCHE changes done Sat 6th:

  • merged 'organic alcohol' into 'alcohol'
  • merged hydroxyproline into 4-hydroxyproline; made L-hydroxyproline is_a hydroxyproline
  • removed redundant links
  • changed namespace for new terms to GOCHE from GO
  • resolved multiple definition dbxrefs (GOC:dphtb, GOC:goche, GOC:chem_mtg) to a single reference = GOC:chem_mtg
  • added all versions of files worked on for the project into cvs for versioning control and easy access for all parties
  • at the end of the two days of work (and including the 2 hrs of prep work done by D and T), a total of 406 terms had been reviewed, of which 55 were new terms added to the ontology. Of the 55 new terms added to the ontology, 12 were also found in the CHebi ontology.


Rules for GOCHE

  • if you are X biosynthesis or X catabolism, only follow is_a paths up the graph via X metabolism.
    • use MF-BP links to capture links between (e.g.) pyruvate and glucose, or create specific subchildren of a term like 'pyruvate metabolic process involved in gluconeogenesis'. The way we have it now is incorrect.
  • use ChEBI '-ic acid' ID for '-ate' GO terms - it's the ionized form that's biologically relevant. Note that in this case we will not be univocal with Chebi.
  • a part_of link in GO should not translate into an is_a link in GOCHE
  • ignore modification terms for building GOCHE paths
  • every chemical with a path to 'small molecule','drug','neurotransmitter' or any other role MUST also have a path classifying it based on structure
  • if GO term is not plural, but GOCHE term points the ID for a Chebi plural, then GO should use an 'and derivative' term. Eventually we will use the definition of the Chebi plural term for the 'and derivative' terms in GO. We like the idea of the skeletal framework being the criteria for these.

Action items:

  • divide up remaining GOCHE terms
    • do straightforward ones individually; save up queries/problems for teamwork
  • find chemicals named in GO missing from GOCHE
    • look at children of GO terms that do have corresponding GOCHE entries
  • organic alcohol = alcohol, so fix transport term name - DONE.
  • clean up 'heterocycle' vs 'heterocyclic compound' in both GO & GOCHE
  • clean up hyphenation inconsistency
  • DNA binding & RNA binding
    • separate out hierarchies based on chemistry from those based on SO terms
    • don't forget ncRNA
  • missing synonyms for *acylglycerol B & C
  • merge 'histidine family' terms into 'histidine' terms, since there are no other children
  • rename 'quinone cofactor' terms to just 'quinone'

Steps:

Editing schedule:

Tuesdays

Jane or Midori: till 5 pm UK time

David or Harold: 12 to 4 pm East coast time

Tanya: after David and Harold are done

Tips:

Set default namespace to GOCHE.

Add GOC:chem_mtg as a definition dbxref to all terms you look at, either by adding an 'x.' definition to the term or by filling in a missing definition dbxref. You can then use the Term Renderer to see all the terms that have been reviewed and what is still left to do. Definition dbxref = GOC:chem_mtg, foreground color = color of your choice

  1. Finish reviewing GOCHE, adding missing compounds
    • During this process, submit issues/new terms to ChEBI
    • When this process is finished, every GOCHE term will have a chebi id
  2. Refine relationships and upper level structure of GOCHE
  3. Create xps using GOCHE terms
    • Clean-up GO chem terms
    • Automatically fill in missing chemicals
  4. Align GOCHE with chebi

Prose description of plan

We started to add letters after the GOCHE terms to see how internally inconsistent GO was with its representation of chemicals. Tanya and David were already aware that we had many inconsistencies from doing the F-P links. That was why we thought that this project should now take priority. We have stopped adding the letters now, but we were convinced that at the higher levels, GO is inconsistent. We can't just rely on the automatically generated cross products because GO has chemicals that are not represented in CHEBI and in some cases, we need to have a good look to decide if we agree on the mappings between GO and CHEBI; for example, is 'benzenes' equivalent to 'benzene and derivative'? We have not always been consistent in how we group things. Only by looking the way we are can we spot these inconsistencies. For now, we are keeping both the GOCHE term and the CHEBI term. We may at some point merge them, or ask CHEBI to create a new CHEBI term.

The plan is to work only on GOCHE for now. We know that there are areas in it that don't make sense. We will fix and discuss those as we go along. Then in the end, we will see how well GOCHE aligns with CHEBI. Where it doesn't align we need to find out why. Once we have GOCHE in a state where we like it. We will use it in the cross-product definitions of every term with a chemical representation in GO. We can then use the reasoner to be certain that GO remains internally consistent. As we work out the bugs between GOCHE and CHEBI, we will then be aligning with CHEBI.

So GOCHE is being used as a 'metaontology' to show us how we currently represent chemicals in GO. The text strings of chemical names are univocal with GO for now. We will fix the GOCHE metaontology and then use it to clean up the inconsistencies and errors in GO. This way we can keep our own house in order as we work with CHEBI to align. As we are working through GOCHE, we are already sending issues and questions to CHEBI to try to work things out with them.

Notes and questions

Midori 2010-03-08

GOCHE doesn't seem to have a term for 'organic compound' (by that or any other name). ChEBI defines 'organic molecular entity' as 'A molecular entity that contains carbon', so we can say organic compound = carbon compound in GO (and rearrange a few existing links accordingly). (-midori 2010-03-08)

Midori 2010-03-09

Added new root 'organic substance' GOCHE:0000001 (CHEBI:50860). Many, many terms must be moved under it in both GO and GOCHE, but I haven't done that yet in either ontology. I can start on both in parallel any time.

I encountered a case where different GO terms use different names for the same chemical! It's CHEBI:1427. Neither Jane nor I could recall that situation cropping up during the meeting, so we crafted a strategy on the spot:

  • used the ChEBI name in GOCHE
  • added exact synonyms in GOCHE for each variant occurring in GO
  • used GO IDs as synonym dbxrefs