Ontology meeting 2013-10-24: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
Line 1: Line 1:
[[Category:Ontology]]
Attendees: Paola, David OS, Heiko, Rebecca, Tanya, Jane, David H., Harold
Attendees: Paola, David OS, Heiko, Rebecca, Tanya, Jane, David H., Harold


Minutes: Rebecca
Minutes: Rebecca


===Legitimate GO term?===
===Legitimate GO term?===
Line 16: Line 16:


Make a general rule: We don't create enzyme terms for non-natural substrates.''
Make a general rule: We don't create enzyme terms for non-natural substrates.''


===Review of Jira tickets===
===Review of Jira tickets===

Latest revision as of 15:58, 1 July 2014

Attendees: Paola, David OS, Heiko, Rebecca, Tanya, Jane, David H., Harold

Minutes: Rebecca

Legitimate GO term?

GO:0003869 4-nitrophenylphosphatase activity

https://sourceforge.net/p/geneontology/ontology-requests/10452/

Substrate appears to be an indicator but the activity is widely described by the GO term name in the literature. Keep?


The question is whether it's an appropriate term to have in GO (does it represent a naturally occuring reaction?). If it doesn't distinguish mechanisms of different phosphatases, we could merge it into the generic phosphatase reaction (keeping 4-nitrophenylphosphatase as a narrow synonym).

Make a general rule: We don't create enzyme terms for non-natural substrates.

Review of Jira tickets

...

skipped


Classification of protein complexes

  • Should we take components and protein families into account ? If so, how?
  • Should we add more abstract, functional classes, or just rely on adding capable_of/capable_of_part_of links to molecular function terms?

This agenda item was prompted by the edits and questions described in SF ticket 10443. Briefly, it is clear that the only safe way to write XP definitions for 'ATP-binding cassette (ABC) transporter complex' (GO:0043190) would be with reference to the ABC transporter protein superfamily, perhaps via InterPro?. Formal definition of at least some child terms would require this + references to plus references to components, or perhaps to external resources collecting information on protein complexes (via Intact?).

More generally: the hierarchy under 'protein complex' (~1500 terms) is currently quite flat and consists mostly of terms for complexes defined in part by their constituent proteins. It contains relatively little abstractly defined classification of complexes based on location and function. About 1/3 have some assertions of function coming from about 100 direct assertions - almost all of which are in XP definitions of the form 'protein complex that capable_of some X'. Given the problems described in SF-10443, I think these need to be reviewed, questioning whether XP defs are safe or should be relaxed to relationships (AKA SubClassing assertions).

  • Should we take components and protein families into account ? If so, how?
    • DOS: My instinct that we should try to leverage external classification systems and reference sources for this - e.g. Intact & InterPro
  • Should we add more abstract, functional classes, or just rely on adding capable_of/capable_of_part_of links to molecular function terms?
    • DOS: I think that recording function and location plays to the strengths of GO. We should definitely be doing this as comprehensively and completely as possible. I'm agnostic about whether we should add a layer of abstract classes for complexes defined purely by function and location. These would certainly be useful for grouping, but adding them would be a lot of work and I could imagine purpose built tools that allowed users to construct their own queries for complexes based on their function and location.


Summary: For several protein complexes, the XP only describes the activity, but the definitions describe a lot more about the complex. So the textual definition needs broadening to just define the complex functionally, and be more conservative in making these XPs.

There are going to be multiple protein complexes with the same activity and different components (e.g. transcription factors).

NB: Can have use capable_of_part_of_process for a complex.



OWL Modeling Challenges

Problem #1 - transitivity cannot be combined with cardinality

Example: this is actually illegal OWL:

[Term]
id: PR:000036157
name: BCL2/adenovirus E1B 19 kDa protein-interacting protein 3-like homodimer (mouse)
is_a: GO:0043234  ! protein complex
relationship: has_part PR:000036146 {cardinality="2"} ! BCL2/adenovirus E1B 19 kDa protein-interacting protein 3-like isoform 1 (mouse)
relationship: only_in_taxon NCBITaxon:10090 ! Mus musculus

The reason for the illegality is somewhat technical and it's an annoying restriction. The most common workaround is to introduce an arbitrarily named sub-relation of has_part. In RO we have has_component. I've had various discussions with Darren Alan and others I don't remember what the status is.

When we MIREOT subsets of PRO for Uberon, CL we don't really use the complex terms and if we did we would just do the transform as part of the MIREOT, but really you don't want people to have to rewrite PRO to be able to use it.

Problem #2 - Implicit closed world definitions

This is actually not a problem for PRO so far, as there are no logical definitions (equivalence axioms, "intersection_of" in obo) for PRO complexes. There are logical relationships (see above), which are valid, but insufficient to perform classifications - for that you need logical definitions.

It's not enough to say

[1] "AB complex" EquivalentTo complex and has_component exactly 1 A and has_component exactly 1 B

The expression on the RHS includes ABC, ABD, ABCD, ... as well as a pure AB pairing. So you can define "AB-containing complex" using this simple pattern but to restrict membership you need to say something like

[2] "AB complex" EquivalentTo complex and has_component exactly 1 A and has_component exactly 1 B and has_component exactly 2 protein

Whether this is a problem is a matter of opinion - it makes things a bit more awkward but it's doable. It gets harder however if you want to define complexes recursively

Problem #3 - cardinality is outside EL++ profile

This means fast reasoners like Elk ignore the cardinality axioms. May be a problem if PRO increases dramatically in size, may be less of a problem if hybrid reasoners such as MoRE can be used. Is PRO actively working with the reasoner developer community (in particular Horrocks' group), they would probably like to have example ontologies here.


Addition of disjointness declarations to TermGenie regulation template?

This seems like a sensible error check to have:

'positive regulation of X' disjointWith 'negative regulation of X'

It would have instantly caught

positive regulation of strand invasion is_a: negative regulation of strand invasion

which I accidentally added last week (now corrected).

Can we add it to the term genie template?

Comment by Heiko: Could/Should this be a (set of) general disjoints? This looks good check for the main validation pipeline.
I would like to avoid special checks for a template. A set of axioms in OWL and using the OWL reasoner
would be better than an adhoc solution in TG.

David OS would like TG to generate an extra disjointness axiom. Heiko says this is currently problematic as you'd have to save out a separate disjoint file in OWL. David will look into simplifying our files.

David OS wants to know if there's a 'test' file he can work on, without breaking it for everyone. The short answer is no.


TG template Protein complex by activity

Chris mentioned that most of the xps for protein complexes defined by their activity are in place. There is a ticket here: https://www.ebi.ac.uk/panda/jira/browse/GO-204

Some things to define/discuss:

  • Name of the template?
  • A generic template for the textual definition
  • Some of the complexes have a comment stanza to also look at the activity term, should this also be generated?
  • Do we also want to handle: 'capable_of_part_of'? Right now it would just be 'protein complex' and 'capable_of' some 'molecular function'


Suggested names of template:

  • protein complex by function
  • protein complex by activity
  • Need to take location (and sometimes process) into account.

Need to check whether there are any function terms we'd need for making these XPs, which don't have 'activity' at the end?

Question: do we want to say 'capable of' or 'has function'?

Heiko will add this template, and we can test it.


RHEA mappings (continued from previous call)

RHEA:13884 is mapped to 3 GO terms:

aldehyde dehydrogenase (FAD-independent) activity ; GO:0033727
EC:1.2.99.7
Catalysis of the reaction: an aldehyde + H2O + acceptor = a carboxylate + reduced acceptor.
Comments: The enzyme from Desulfovibrio sp. does not contain FAD.
aldehyde dehydrogenase (pyrroloquinoline-quinone) activity ; GO:0047113
Catalysis of the reaction: an aldehyde + acceptor + H2O = a carboxylate + reduced acceptor.
http://www.chem.qmul.ac.uk/iubmb/enzyme/EC1/2/99/3.html
Comments: A quinoprotein. Wide specificity; acts on straight-chain aldehydes up to C10, aromatic aldehydes, glyoxylate and glyceraldehyde.
carboxylate reductase activity ; GO:0047770
Catalysis of the reaction: an aldehyde + acceptor + H2O = a carboxylate + reduced acceptor.
EC:1.2.99.6
Comments: A tungsten protein. Methyl viologen can act as acceptor. In the reverse direction, non-activated acids are reduced by reduced viologens to aldehydes, but not to the corresponding alcohols.

Feedback from RHEA:

  • There are three different EC numbers because the substrate specificities are different (difficult to capture this set in GO terms though).
  • In short, Rhea will not create specific reactions with the cofactors (the cofactor cited for each of these EC numbers are not the acceptors mentioned in the reaction).