Ontology meeting 2016-02-25
Attendees: Paola, DavidH, Chris, Harold, Heiko, Tanya, Melanie, DavidOS
Minutes: Tanya
Follow-up: Blocking GH comment/spam
Thanks Chris for blocking/reporting this one. Should be able to do this on an individual account basis. Bring up again if still getting spam.
Follow-up: Discuss strategy for ontology requests
Assign people to action items from last week. Relevant discussion copied below:
We need to have a better set of guidelines for dealing with ontology tickets. Perhaps we can generalize Jane's flow chart of when to request a termgenie term.
http://go.termgenie.org/help/index.html
Strategies discussed: *Improved flow chart - remind people about flow chart's existence. *Tickets with insufficient detail will not be dealt with. Add 'More info needed' label to tickets that fall in this category so that the user will know they have to take action and don't feel ignored. *Make and add a template/instructions (contributing.md file) for new tickets to GH. Will be seen by all people going to tracker (at the top of the form) when they open a new issue. (DONE, see current version) *Need form type interface for Noctua for simpler gene-term interface to alleviate concerns for going to full blown LEGO. This might help with Val's concern about need for using LEGO for people who aren't ready for LEGO. *Templated generation of branches of the ontology - filling in terms for classes we know must exist rather than waiting for requests. Good example of this is components for membranes.
Come back to this next week?
Specificity of activity terms
Example: https://github.com/geneontology/go-ontology/issues/12282
TL;DR: deubiquitinases can be either cystein-type peptidases (i.e., thiol-dependent) or metallopeptidase. They also have specific residues of action. We had GO:0061578 Lys63-specific deubiquitinase activity, which was under cysteine-type peptidase. It could also be a metallopeptidase based on term request from Antonia, so it has been moved under a general deubiquitinase term, with the idea that curators could do 2 annotations: Protein A is_a Lys63-specific deubiquitinase activity AND Protein A is_a metallopeptidase OR Protein A is_a Lys63-specific deubiquitinase activity AND Protein A is_a cysteine-type peptidase
However, annotating to 2 terms for one activity seems to be an issue (is it? New terms would in effect concatene those 2), solution to which would be creating specific terms Lys63-specific cysteine-type peptidase deubiquitinase activity and Lys63-specific metallopeptidase deubiquitinase activity. This becomes quickly exponential and probably not desired.
Note that I chose this example, but same situation arose in many cases, such as https://github.com/geneontology/go-ontology/labels/discuss_term_specificity
Annotation to the two separate terms vs. one term that encompasses both is recommended. DavidH will try to create a LEGO model showing an alternative that would represent the 'one term' solution. PMID:26368668
The model is here: http://noctua.berkeleybop.org/editor/graph/gomodel:56cbaef000000102 The left hand side would represent the enzyme as a metallopeptidase if we make the logical def for metallopeptidase in the ontology. I prefer the right-hand side.
ec2GO Clean up
It appears that when a truncated EC number is used as a dbxref, all of the children of the number get mapped to that EC number. This results in specific enzymes with EC xrefs to the more generic EC numbers get annotated to all of the children of the generic enzyme activity. Here is a snip from ec2go. I think we need to remove the lines in ec2go for any EC# that has a '-'.
EC:1.-.-.- > GO:N-ethylmaleimide reductase activity ; GO:0008748
EC:1.-.-.- > GO:reduced coenzyme F420 dehydrogenase activity ; GO:0043738
EC:1.-.-.- > GO:sulfur oxygenase reductase activity ; GO:0043826
EC:1.-.-.- > GO:malolactic enzyme activity ; GO:0043883
EC:1.-.-.- > GO:NADPH:sulfur oxidoreductase activity ; GO:0043914
EC:1.-.-.- > GO:epoxyqueuosine reductase activity ; GO:0052693
We generate this file. This needs fixing. How do we fix this? Should the relation be 1:1 for full EC numbers:GO terms? Currently there are 250 cases where there is a many:1 ration and the EC numbers do not contain the '-'. These need to be checked. Strip the dashed xrefs from the ontology file OR do not generate lines in ec2go that have the EC dashes in them. Can put EC number with dashes into the Definition xrefs (Example: Definition: xxxxxxxxxx [PMID:1234456, EC:2.4.1.-]). GO basically follows EC but sometimes has more specificity, not contradictory.
AI #1: Chris will remove ALL EC xrefs in the GO file that have a dash in the number, look like this xref: EC:2.-.-.- xref: EC:2.1.-.- xref: EC:2.2.3.-
AI #2: Review 250 cases where there are GO terms with more than one FULL EC number (eyeball they are in different categories). Harold will look at these, Chris will send. (HJD and DPH have 1100 enzymes to add for Peifen.)
Extracting info from EC (KEGG) programmatically
Discuss https://github.com/geneontology/ec-extractor
This is a stop-gap. Need to return to this discussion at some point to separate things into complete automation and manual review.
Called EC extractor but actually goes through KEGG as the intermediate because KEGG has API. Check names for the chemicals: are they compatible with ChEBI or not? Current GO situation: Bidirectional reactions generally have <=> instead of =.
Side discussion of how to formalize bidirectional reactions. Ramifications for BFO. Commit to directionality for enzymes like kinases. For LEGO models, commit to a direction using inputs and outputs.
Discussion on incorporation of new MetaCyc reactions into GO (Harold and DavidH). 10 terms reviewed so far, 10 had to be edited. Name modification, finding existing terms that matched but the match was missed. Some workflow for handling new enzyme activities would be helpful. Are there patterns that can be documented for the manual work?