Ontology meeting 2016-02-25

From GO Wiki
Jump to navigation Jump to search

Attendees: Paola, DavidH, Chris, Harold, Heiko, Tanya, Melanie, DavidOS

Minutes: Tanya

Follow-up: Blocking GH comment/spam

Thanks Chris for blocking/reporting this one. Should be able to do this on an individual account basis. 
Bring up again if still getting spam.

Follow-up: Discuss strategy for ontology requests

Assign people to action items from last week. Relevant discussion copied below:

We need to have a better set of guidelines for dealing with ontology tickets. Perhaps we can generalize Jane's flow chart of when to request a termgenie term.

http://go.termgenie.org/help/index.html

 Strategies discussed:
 *Improved flow chart - remind people about flow chart's existence.
 *Tickets with insufficient detail will not be dealt with.  Add 'More info needed' label to tickets that 
 fall in this category so that the user will know they have to take action and don't feel ignored.
 *Make and add a template/instructions (contributing.md file) for new tickets to GH.  Will be seen by all 
 people going to tracker (at the top of the form) when they open a new issue. (DONE, see current version)
 *Need form type interface for Noctua for simpler gene-term interface to alleviate concerns for going to 
 full blown LEGO. This might help with Val's concern about need for using LEGO for people who aren't ready 
 for LEGO.
 *Templated generation of branches of the ontology - filling in terms for classes we know must exist rather 
 than waiting for requests.
   Good example of this is components for membranes.
 Come back to this next week?

Specificity of activity terms

Example: https://github.com/geneontology/go-ontology/issues/12282

TL;DR: deubiquitinases can be either cystein-type peptidases (i.e., thiol-dependent) or metallopeptidase. They also have specific residues of action. We had GO:0061578 Lys63-specific deubiquitinase activity, which was under cysteine-type peptidase. It could also be a metallopeptidase based on term request from Antonia, so it has been moved under a general deubiquitinase term, with the idea that curators could do 2 annotations: Protein A is_a Lys63-specific deubiquitinase activity AND Protein A is_a metallopeptidase OR Protein A is_a Lys63-specific deubiquitinase activity AND Protein A is_a cysteine-type peptidase

However, annotating to 2 terms for one activity seems to be an issue (is it? New terms would in effect concatene those 2), solution to which would be creating specific terms Lys63-specific cysteine-type peptidase deubiquitinase activity and Lys63-specific metallopeptidase deubiquitinase activity. This becomes quickly exponential and probably not desired.

Note that I chose this example, but same situation arose in many cases, such as https://github.com/geneontology/go-ontology/labels/discuss_term_specificity

 Annotation to the two separate terms vs. one term that encompasses both is recommended.  DavidH will try to 
 create a LEGO model showing an alternative that would represent the 'one term' solution. PMID:26368668

The model is here: http://noctua.berkeleybop.org/editor/graph/gomodel:56cbaef000000102 The left hand side would represent the enzyme as a metallopeptidase if we make the logical def for metallopeptidase in the ontology. I prefer the right-hand side.

ec2GO Clean up

It appears that when a truncated EC number is used as a dbxref, all of the children of the number get mapped to that EC number. This results in specific enzymes with EC xrefs to the more generic EC numbers get annotated to all of the children of the generic enzyme activity. Here is a snip from ec2go. I think we need to remove the lines in ec2go for any EC# that has a '-'.

EC:1.-.-.- > GO:N-ethylmaleimide reductase activity ; GO:0008748

EC:1.-.-.- > GO:reduced coenzyme F420 dehydrogenase activity ; GO:0043738

EC:1.-.-.- > GO:sulfur oxygenase reductase activity ; GO:0043826

EC:1.-.-.- > GO:malolactic enzyme activity ; GO:0043883

EC:1.-.-.- > GO:NADPH:sulfur oxidoreductase activity ; GO:0043914

EC:1.-.-.- > GO:epoxyqueuosine reductase activity ; GO:0052693

 We generate this file.  This needs fixing.  How do we fix this?  Should the relation be 1:1 for full 
 EC numbers:GO terms? Currently there are 250 cases where there is a many:1 ration and the EC numbers 
 do not contain the '-'. These need to be checked.  Strip the dashed xrefs from the ontology file OR do
 not generate lines in ec2go that have the EC dashes in them. Can put EC number with dashes into the 
 Definition xrefs (Example: Definition: xxxxxxxxxx [PMID:1234456, EC:2.4.1.-]).
 GO basically follows EC but sometimes has more specificity, not contradictory.
 AI #1:  Chris will remove ALL EC xrefs in the GO file that have a dash in the number, look like this
 xref: EC:2.-.-.-
 xref: EC:2.1.-.-
 xref: EC:2.2.3.-
 AI #2: Review 250 cases where there are GO terms with more than one FULL EC number (eyeball they are in different
 categories). Harold will look at these, Chris will send.  (HJD and DPH have 1100 enzymes to add for Peifen.)

Extracting info from EC (KEGG) programmatically

Discuss https://github.com/geneontology/ec-extractor

 This is a stop-gap. Need to return to this discussion at some point to separate things into complete automation
 and manual review.
 Called EC extractor but actually goes through KEGG as the intermediate because KEGG has API.
 Check names for the chemicals:  are they compatible with ChEBI or not?
 Current GO situation: Bidirectional reactions generally have <=> instead of =.
 Side discussion of how to formalize bidirectional reactions. Ramifications for BFO.  Commit to directionality
 for enzymes like kinases. For LEGO models, commit to a direction using inputs and outputs.
 Discussion on incorporation of new MetaCyc reactions into GO (Harold and DavidH). 10 terms reviewed so far, 
 10 had to be edited.  Name modification, finding existing terms that matched but the match was missed.  
 Some workflow for handling new enzyme activities would be helpful.  Are there patterns that can be documented 
 for the manual work?