Chemical terms in GO: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 78: Line 78:
#*Automatically fill in missing chemicals
#*Automatically fill in missing chemicals
#Align GOCHE with chebi
#Align GOCHE with chebi
===Prose description of plan===
We started to add letters after the GOCHE terms to see how internally inconsistent GO was with its representation of chemicals. Tanya and David were already aware that we had many inconsistencies from doing the F-P links. That was why we thought that this project should now take priority. We have stopped adding the letters now, but we were convinced that at the higher levels, GO is inconsistent. We can't just rely on the automatically generated cross products because GO has chemicals that are not represented in CHEBI and in some cases, we need to have a good look to decide if we agree on the mappings between GO and CHEBI; for example, is 'benzenes' equivalent to 'benzene and derivative'? We have not always been consistent in how we group things. Only by looking the way we are can we spot these inconsistencies. For now, we are keeping both the GOCHE term and the CHEBI term. We may at some point merge them, or ask CHEBI to create a new CHEBI term.
The plan is to work only on GOCHE for now. We know that there are areas in it that don't make sense.  We will fix and discuss those as we go along. Then in the end, we will see how well GOCHE aligns with CHEBI. Where it doesn't align we need to find out why. Once we have GOCHE in a state where we like it. We will use it in the cross-product definitions of every term with a chemical representation in GO. We can then use the reasoner to be certain that GO remains internally consistent. As we work out the bugs between GOCHE and CHEBI, we will then be aligning with CHEBI.
So GOCHE is being used as a 'metaontology' to show us how we currently represent chemicals in GO. The text strings of chemical names are univocal with GO for now. We will fix the GOCHE metaontology and then use it to clean up the inconsistencies and errors in GO. This way we can keep our own house in order as we work with CHEBI to align. As we are working through GOCHE, we are already sending issues and questions to CHEBI to try to work things out with them.
==Notes and questions==
*GOCHE doesn't seem to have a term for 'organic compound' (by that or any other name). ChEBI defines 'organic molecular entity' as 'A molecular entity that contains carbon', so we can say organic compound = carbon compound in GO (and rearrange a few existing links accordingly). (-midori 2010-03-08)


[[Category:Cross Products]]
[[Category:Cross Products]]
[[Category:Chemical Entities]]
[[Category:Chemical Entities]]

Revision as of 09:40, 8 March 2010

Goal: make terms that refer to chemicals internally consistent in GO prior to alignment with ChEBI

Meeting March 5-6, 2010

Chris generated ontology of chemicals named in GO terms, with Chebi IDs - GOCHE

Going through GOCHE; for each chemical 'X'

see if terms exist for

  • X metabolic process (M)
  • X biosynthetic process (B)
  • X catabolic process (C)
  • X transport (T)
  • X transporter (R)
  • X binding (I)

adjust parentage in GOCHE based on GO paths; add GOCHE terms as needed

GOCHE ends up with union of all paths

March 5th: got through CDP-diacylglycerol
resume at monoglyceride

March 6th: modified approach:

  • skip noting which terms exist in GOCHE file
  • mark GOCHE terms we've looked at with 'chem_mtg' def dbxref
  • note all parents from GO paths as before
  • also filled in more chemicals that GO has but ChEBI doesn't; we don't know what they are!

Concentrated on high-level terms, because that's where most problems crop up; easier to sort out more specific terms without meeting face-to-face

Notes:

  • at present, GO doesn't have paths from nucleoside/tide/base terms to 'aromatic compound' terms, but we will want to add back in GOCHE
  • 'response to chemical substance' branch isn't consistent with the rest of GOCHE at all
  • Note that we have made a few fixes, but have not consistently fixed all problems we've spotted.


GOCHE changes done Sat 6th:

  • merged 'organic alcohol' into 'alcohol'
  • merged hydroxyproline into 4-hydroxyproline; made L-hydroxyproline is_a hydroxyproline
  • removed redundant links

Rules for GOCHE

  • if you are X biosynthesis or X catabolism, you only follow is_a paths up the graph via X metabolism
    • use MF-BP links to capture links between (e.g.) pyruvate and glucose
  • use ChEBI '-ic acid' ID for '-ate' GO terms - it's the ionized form that's biologically relevant
  • a part_of link in GO should not translate into an is_a link in GOCHE
  • ignore modification terms for building GOCHE paths
  • every chemical with a path to 'small molecule','drug','neurotransmitter' or any other role MUST also have a path classifying it based on structure
  • if GO term is not plural, but GOCHE term points the ID for a Chebi plural, then GO should use an 'and derivative' term. Eventually we will use the definition of the Chebi plural term for the 'and derivative' terms in GO. We like the idea of the skeletal framework being the criteria for these.

Action items:

  • divide up remaining GOCHE terms
    • do straightforward ones individually; save up queries/problems for teamwork
  • find chemicals named in GO missing from GOCHE
    • look at children of GO terms that do have corresponding GOCHE entries
  • organic alcohol = alcohol, so fix transport term name - DONE.
  • clean up 'heterocycle' vs 'heterocyclic compound' in both GO & GOCHE
  • clean up hyphenation inconsistency
  • DNA binding & RNA binding
    • separate out hierarchies based on chemistry from those based on SO terms
    • don't forget ncRNA
  • missing synonyms for *acylglycerol B & C
  • merge 'histidine family' terms into 'histidine' terms, since there are no other children
  • rename 'quinone cofactor' terms to just 'quinone'

Steps:

  1. Finish reviewing GOCHE, adding missing compounds
    • During this process, submit issues/new terms to ChEBI
    • When this process is finished, every GOCHE term will have a chebi id
  2. Refine relationships and upper level structure of GOCHE
  3. Create xps using GOCHE terms
    • Clean-up GO chem terms
    • Automatically fill in missing chemicals
  4. Align GOCHE with chebi

Prose description of plan

We started to add letters after the GOCHE terms to see how internally inconsistent GO was with its representation of chemicals. Tanya and David were already aware that we had many inconsistencies from doing the F-P links. That was why we thought that this project should now take priority. We have stopped adding the letters now, but we were convinced that at the higher levels, GO is inconsistent. We can't just rely on the automatically generated cross products because GO has chemicals that are not represented in CHEBI and in some cases, we need to have a good look to decide if we agree on the mappings between GO and CHEBI; for example, is 'benzenes' equivalent to 'benzene and derivative'? We have not always been consistent in how we group things. Only by looking the way we are can we spot these inconsistencies. For now, we are keeping both the GOCHE term and the CHEBI term. We may at some point merge them, or ask CHEBI to create a new CHEBI term.

The plan is to work only on GOCHE for now. We know that there are areas in it that don't make sense. We will fix and discuss those as we go along. Then in the end, we will see how well GOCHE aligns with CHEBI. Where it doesn't align we need to find out why. Once we have GOCHE in a state where we like it. We will use it in the cross-product definitions of every term with a chemical representation in GO. We can then use the reasoner to be certain that GO remains internally consistent. As we work out the bugs between GOCHE and CHEBI, we will then be aligning with CHEBI.

So GOCHE is being used as a 'metaontology' to show us how we currently represent chemicals in GO. The text strings of chemical names are univocal with GO for now. We will fix the GOCHE metaontology and then use it to clean up the inconsistencies and errors in GO. This way we can keep our own house in order as we work with CHEBI to align. As we are working through GOCHE, we are already sending issues and questions to CHEBI to try to work things out with them.

Notes and questions

  • GOCHE doesn't seem to have a term for 'organic compound' (by that or any other name). ChEBI defines 'organic molecular entity' as 'A molecular entity that contains carbon', so we can say organic compound = carbon compound in GO (and rearrange a few existing links accordingly). (-midori 2010-03-08)