Annotation Extension meeting 2014-06-16
Annotation Extension Meeting
- Date: June 16th 2014
- Start Time: 9:30
- Where: EBI-Duxford room
- Jane Lomax
- David O-S
- Ruth Lovering
- Rebecca Foulger
- Pascale Gaudet
- Aleks (am)
- Rachael Huntley
- Valerie Wood
- Chris Mungall (pm)
Note on Unfolding/Folding
Chris wants to put a mechanism in place that will display a human-readable version of the folded annotation at the point of annotation, so that the curator can immediately determine whether the annotation makes sense.
AI: Chris should think more about how this would work in practice and collaborate with Tony to see if it can be implemented in Protein2GO.
Proposal for new relations
- Proposal: We should add more specific terms under has_participant, 'has_input’ and ‘has_direct_input’:
has_input --has_direct_input ----binds ----has_substrate ----transports has_regulation_target ----has_direct_regulation_target has_participant (not used in annotation extensions) ----has_agent (not useful for annotation extensions) ----has_direct_input --------binds --------has_substrate --------transports ----has_output --------transports --------has_product
(draft hierarchy from DOS - some work needed to RO to bring it into line with this)
These should all GO into RO. Please post tickets requesting new terms to [RO tracker](https://code.google.com/p/obo-relations/issues/list). We also need to develop methods to keep RO in sync with relations used in AE.
- Rule: Use for MF and BP only.
ADAM10 case study for has_input
Background: [How to annotate ADAM10, which acts as a protease on MICA to cleave in the membrane ecodomain.](http://wiki.geneontology.org/index.php/Annotation_Extension_Relation:has_input#Using_examples_.28from_above.29_to_demonstrate_Folding_and_Unfolding_using_the_relationship_has_input) Note in the example has_input was used as direct binding of ADAM10 to MICA is not shown. The examples below would be applied only if direct binding between the enzyme and it's substrate were demonstrated.
- 1. annotate ADAM10 to ‘membrane protein ecodomain proteolysis’ (C16: has_direct_input: MICA.) => OWL: membrane protein ecodomain proteolysis’ and has_direct_input some MICA
+ annotate ADAM10 to ‘protease activity’ (C16: has_substrate: MICA, part_of membrane protein ecodomain proteolysis). => OWL: protease activity and (part_of some ‘ecodomain proteolysis’) and (has_substrate some MICA)
- 2. annotate ADAM10 to ‘protease activity’ (C16:part_of membrane protein ecodomain proteolysis, has_substrate MICA). => OWL: protease activity and (part_of some ‘ecodomain proteolysis’) and (has_substrate some MICA)
- 3. annotate to ADAM10 to a new term requested through term genie (MF involved in BP): protease activity involved in membrane protein ecodomain proteolysis. Then use has_substrate:MICA in C16.
=> OWL: (protease activity that part_of some ‘membrane protein ecodomain proteolysis’) and (has_substrate some MICA))
Option 2 vs 3 => subtly different OWL (syntax) translations, but these are semantically equivalent, so folding will be the same (DOS has tested).
Option 1 has apparent redundancy in that MICA is mentioned twice, once as a direct input for the process and a second time as a substrate for the protease activity. However, there is not prospect for inferring this. has_direct_input is actually an unsafe implication as an input to a part of a process can be an intermediate. has_participant (a relationship not available to curators) MICA would be entailed for the process if there was a has_part relationship between the proteolysis term and protease activity. But there is currently no plan to add has_part relationships to enable this. So, this apparent redundancy is justified.
has_input and 'response to'
- Question: Can you use ‘has_input’ with ‘response to x’, recording what ‘x’ is in C16?
- Discussion: Ruth’s example is ‘proteolysis [involved] in cellular response to drug. In this example, you have two has_input relationships:
- has_input: drug
- has_input: proteolysis target.
- The has_inputs work in the individual cases but when combined, how do you know which input is which?
- The drug isn’t an input to the proteolysis. The proteolysis is part of the cellular response to drug.
- Conclusion: You can’t put the drug in the annotation extension for the term ‘proteolysis involved in cellular response to drug’ because the proteolysis is part_of the cellular response. You would use has_input: protein in the combined term. You would need to make an additional annotation to the generic ‘cellular response to drug’ term, using has_input:drug. It’s not ideal because you’ve lost the link that the proteolysis is occurring in response to drug x.
NB: Decided that if we change the is_a ‘response to x’ to part_of relationships, then we can use the GO term part_of ‘response to x’ in the extension. (see AI below for Editors).
- Background: Transcription factors may need to be handled in a specific way. At the moment, it’s confusing what relation to use with transcription factors. Their compound-nature is causing issues because there’s two different functions rolled into one term. E.g. for ‘protein-binding transcription factor activity’, does has_input mean the DOWNSTREAM gene that is being regulated or the PROTEIN that is being bound. Should you use ‘has_regulation_target’ for TF annotations?
- Discussion: Currently, PomBase treats DNA-binding and protein-binding TFs differently. PomBase allows ‘has_regulation_target’ to record the gene targets for sequence-specific TFs. For protein-binding TFs, PomBase capture the gene targets in the BP terms but NOT the TF terms, with the logic that the protein-binding TFs aren’t directly binding to the promoter.
One option is to use more specific relations:
DNA binding TF involved_in regulation of transcription from Pol II
C16: has_regulation_target: some <gene> C16: has_binding_target some <SO sequence element>
protein binding TF involved_in regulation of transcription from Pol II
C16: has_regulation_target: some <gene> C16: has_binding_target some <protein IDt>
For the process terms, you could use ‘has_regulation_target’. David O-S wants to see these written out in OWL.
Conclusion: This isn't yet resolved. To remove the problem that the TF terms don't have an is_a ancestor to 'binding', Val would prefer that the TF terms are revised to 'x binding involved in regulation of transcription from pol II promoter' etc. See AI below.
This is suitable for BP annotations where A is localizing B, but it shouldn't be used for CC annotations. See action item for Val to check her existing CC annotations that use the 'localization_dependent_on' C16 relation.
This needs some further discussion as this relations is currently only allowed when annotating to CC. We will also need to discuss in_presence_of and dependent_on at the same time.
- Often redundant with occurs_in, especially for GO CC
- Generally used at cell surface, sequence regions (e.g. with SO identifiers)
- We discussed merging occurs_at and occurs_in into one relation: occurs_at_or_within_location. But we decided against this because 'occurs_in' is a relation used in GO at the moment, so it seems wrong to make it less specific.
- BP or MF
Conclusion. We'll use occurs_in and occurs_at in the following ways, and redefine the relationships:
- OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC). NB, because the definition of membrane includes the intrinsic and extrinsic components, you would use ‘occurs_in’ for membrane annotations
- OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)
- For now, we are going to restrict has_output to BP only. If you find you need to use this for MF, bring up your example with Rachael and Rama, or request a new GO term.
- Discussion: Should it be used for MF? In some cases, the has_output would be what was assayed in the reaction.
- AI: Could suggest a restriction for use with ‘cytokine production’ terms only.
- We need a better example, ideally where the paper shows stronger evidence for a catalytic activity. And where a catalytic activity can create >1 choice of output. For the current prostaglandin-I synthase activity example, the term definition is: Catalysis of the reaction: prostaglandin H(2) = prostaglandin I(2). Therefore the enzyme will always produce prostaglandin I2, and no extension is needed.
- Can use UniProt/Protein ID in C16 with this extension. Not a PRO ID, because for any given species, you don’t need the generic identifier because you’ll know the species-specific (UniProt) one.
- Can use PRO feature chain ID if you can, to be more specific.
DURING, HAPPENS_DURING AND EXISTS_DURING
The current tree stands at:
DURING —EXISTS DURING (CC terms including (but not restricted to) protein complexes) —HAPPENS DURING (MF and BP)
- AI: Don't use the grouping term ‘during’ in annotation extensions, and just use the more specific terms. David OS will look at removing the 'during' relationship completely because of issues with its definition.
- Make a new rule: for phase terms, you HAVE to use ‘happens_during’ (not part_of).
- For other GO process terms, use happens_during if you don’t know if it contributes to the process.
AI: Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.
NB: Some of the existing ‘during’ C16 relations in Protein2GO at the moment look slightly odd. Need to relook at these.
Everyone agrees !!! :o
- The extracellular terms need a bit of work because there’s some annotations at the moment to ‘extracellular matrix’ part_of x_cell. Where logically you can’t have an extracellular space that is part of a cell.
- Allow use of the RO relation ‘adjacent_to’ for annotation extensions for CC extracellular annotations. When this is done, MGI will need to relook at their ‘extracellular space’ part_of ‘x-cell’ annotations.
This relation ties into the transcription factor terms.
Useful to have distinction between direct and indirect. So can we just use 'has_indirect_target'? Chris prefers 'has_regulation_target' because it's more specific.
- DNA-binding TF activity: has_regulation_target: some gene
- DNA-binding TF activity: has_input/has_substrate: some DNA (SO ID, which is specific for the motif)
... BUT DNA-binding TF activity doesn't have is_a DNA binding as a parentage, so it's wrong to say has_input:DNA for this term. This comes back to Val's suggestions for the TF terms, to change to: DNA binding involved in negative regulation of transcription....
Some of the issues here are because it seems redundant to have a regulation GO term with 'regulation' in the annotation extension relationship.
It was agreed that we should continue to use the relationship has_regulation_target when extending 'regulation of BP' GO terms. However it was felt that extension of the MF GO terms such as endopeptidase inhibitor activity should use the relationship 'has_direct_input' as the protein identified included in the annotation extension should be known to bind the protein annotated as an inhibitor.
Also an example was identified where the annotation extension was inappropriate: negative regulation of intrinsic apoptotic signaling pathway, this identified that has_regulation_target should not be used to specify a downstream process regulated by a signaling pathway. Possibly instead should use 'causally_upstream_of'. In addition it was agreed that a multistep process such as 'negative regulation of intrinsic apoptotic signaling pathway' should not specify a protein with has_input.
- Encourage curators who are new to annotation extensions to start with the following relations;
PROPOSED ACTION ITEMS
1. EDITORS: Change children of ‘response to x’ from is_a to part_of, throughout GO.
2. ANNOTATORS (Rachael and Val): check cellular component annotations that have ‘localization_dependent_on’ in C16. These are wrong. ‘localization_dependent_on’ makes sense for BP terms when A is controlling the localisation of protein B. But it doesn’t make sense for CC. Consider changing to ‘in_presence_of’.
3. ANNOTATORS: Decide if we want to see the hierarchy when we’re choosing our annotation extension in Protein2GO.
4. DAVID OS. Create new more specific has_input relations for: has_substrate, has_transport_target (transports), has_binding_target (binds).
5. RACHAEL Change definition of ‘has_input’ to allow for its use with ‘cellular response’ terms? Currently is says ‘bound, transported, modified, consumed or destroyed’…. DONE 4/7/2014
6. VAL AND EDITORS/DAVID HILL: look at the transcription terms. Val would like a term ‘DNA binding involved in negative regulation of transcription from RNA pol II promoter’, etc. This term would be is_a DNA binding. The advantage of this term is that you wouldn’t need to make two annotations: one for the binding, and one for the TF activity, because the term would be is_a binding. And doesn’t squeeze a process into a function term (so much). Val to submit a SourceForge item as a placeholder.
7. DAVID OS: Write out the transcription factor suggestions in OWL, to check they make sense.
8. Rachael Better define the rules for OCCURS_AT and OCCURS_IN (see above)
- OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC)
- OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC) DONE 4/7/2014
9. RUTH: Update example for BP HAS_OUTPUT:x. Could use ‘fibroblast growth factor production ; GO:0090269’ with one of the FGF1-10 IDs. DONE 21/07/2014
10. RACHAEL: Edit the annotation extension file to make rule that has_output can (for the moment) only be used for GO-BP annotations. Enforce this rule in Protein2GO, and add to rule file for curators not using Protein2GO. NOT YET DONE; discussion on subsequent annotation call (http://wiki.geneontology.org/index.php/Annotation_Conf._Call,_June_24,_2014) disagreed with this 11/9/2014
11.PASCALE AND VAL: Look to see if we can add a restriction that has_output can be used with ‘x production’ terms only, for now. We can broaden/change if necessary. E.g. cytokine production + cell adhesion molecule production.
12: RACHAEL? Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship. NOT DONE - this is not currently possible (4/7/2014)
13. RACHAEL: Alter local_range for HAPPENS_DURING and EXISTS_DURING to remove GO-MF information. DONE 4/7/2014
14. RUTH Ruth to alter wiki for HAPPENS_DURING so GO-BP but not GO-MF can’t be used in C16. Comment added wrt BPs. I think this item should read 'so GO-BP can be used in C16; and GO-MF can’t be used in C16'. I haven't tried to address the item as it stands, waiting for confirmation that the wiki updates are sufficient. DONE 21/07/2014
15. DAVID OS: remove ‘during’ from relationships, because it can't be properly defined. It's children 'exists_during' and 'happens_during' will remain. DONE 4/7/2014
16. VAL and RACHAEL Look at exists_during relation uses to see if they make sense. Need to confirm if the usage of exists_during as currently defined, i.e. CC exists_during process/phase, is appropriate 11/9/2014
17. RUTH: In the part_of example on the wiki page, make it clear in a footnote or something, that Wnt-activated R activity is already part_of Wnt signaling pathway. Here we’re making a more specific statement that the Wnt-activated R activity is part_of a CANONICAL Wnt signaling pathway. DONE 21/07/2014
18. DAVID OS: make a new relation: ‘adjacent_to’ to describe extracellular regions that are next to a cell.
19. CHRIS/TONY: should think more about how the on-the-fly human-readable display of folded annotations would work in practice and collaborate with Tony to see if it can be implemented in Protein2GO.
20. DAVID OS: add OWL statements for the 2 HAPPENS_DURING examples and also add the parents for the new folded GO term: canonical Wnt signaling pathway during limb morphogenesis on this page.