Annotation Extension meeting 2014-06-16: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
 
(39 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Annotation Extension Meeting
Annotation Extension Meeting
June 16th 2014
* Date: June 16th 2014
EBI-Duxford room
* Start Time: 9:30
* Where: EBI-Duxford room


Present
Present:
*Jane Lomax
*Jane Lomax
*David O-S
*David O-S
Line 9: Line 10:
*Rebecca Foulger
*Rebecca Foulger
*Pascale Gaudet
*Pascale Gaudet
*Aleks
*Aleks (am)
*Rachael Huntley
*Rachael Huntley
*Valerie Wood
*Valerie Wood
*Chris Mungall
*Chris Mungall (pm)


==Note on Unfolding/Folding==


Chris wants to put a mechanism in place that will display a human-readable version of the folded annotation at the point of annotation, so that the curator can immediately determine whether the annotation makes sense.


==<font color="navy">HAS_INPUT</font color>==
AI: Chris should think more about how this would work in practice and collaborate with Tony to see if it can be implemented in Protein2GO.
 
==<font color="navy">Proposal for new relations</font color>==


* Use for MF and BP only.


* Discussion: We should have more specific terms under ‘has_input’ and ‘has_direct_input’:  
* Proposal: We should add more specific terms under has_participant, 'has_input’ and ‘has_direct_input’:  


  has_input
  has_input
Line 27: Line 31:
  ----has_substrate
  ----has_substrate
  ----transports
  ----transports
----has_direct_regulation_target
   has_regulation_target
   has_regulation_target
  ----has_direct_regulation_target
  ----has_direct_regulation_target
   has_participant  
   has_participant (not used in annotation extensions)
  ----has_agent
  ----has_agent (not useful for annotation extensions)
----has_direct_input
--------binds
--------has_substrate
--------transports
  ----has_output
  ----has_output
  ----transports  
  --------transports  
  ----has_product
  --------has_product
 
(draft hierarchy from DOS - some work needed to RO to bring it into line with this)
 
These should all GO into RO.  Please post tickets requesting new terms to [RO tracker](https://code.google.com/p/obo-relations/issues/list).  We also need to develop methods to keep RO in sync with relations used in AE.
 
 
==<font color="navy">HAS_INPUT</font color>==
 
* Rule: Use for MF and BP only.
 
 
===<font color="navy">ADAM10 case study for has_input</font color>===
 
'''Background''': [How to annotate ADAM10, which acts as a protease on MICA to cleave in the membrane ecodomain.](http://wiki.geneontology.org/index.php/Annotation_Extension_Relation:has_input#Using_examples_.28from_above.29_to_demonstrate_Folding_and_Unfolding_using_the_relationship_has_input) Note in the example has_input was used as direct binding of ADAM10 to MICA is not shown. The examples below would be applied only if direct binding between the enzyme and it's substrate were demonstrated.


===<font color="navy">Case study for has_input</font color>===
*'''Options''':


*Discussion: How do you annotate ADAM10, which acts as a protease on MICA to cleave in the membrane ecodomain.
* 1. annotate ADAM10 to ‘membrane protein ecodomain proteolysis’ (C16: has_direct_input: MICA.) => OWL: membrane protein ecodomain proteolysis’ and has_direct_input some MICA
*in OWL: capable_of some ((‘protease activity’) that has_substrate some mica) and (part_of some ‘ecodomain proteolysis’))
+  annotate ADAM10 to ‘protease activity’ (C16: has_substrate: MICA, part_of membrane protein ecodomain proteolysis). => OWL: protease activity and (part_of some ‘ecodomain proteolysis’) and (has_substrate some MICA)
*protease activity that part_of some ‘ecodomain proteolysis’ and (has_substrate some mica).


* annotate ADAM10 to ‘membrane protein ecodomain proteolysis’ has_direct_input: MICA.
* annotate ADAM10 to ‘protease activity’ has_substrate: MICA.
OR
OR
* annotate ADAM10 to ‘protease activity’ (C16:part_of: ecodomain proteolysis).
* 2. annotate ADAM10 to ‘protease activity’ (C16:part_of membrane protein ecodomain proteolysis, has_substrate MICA). => OWL: protease activity and (part_of some ‘ecodomain proteolysis’) and (has_substrate some MICA)
OR:
OR:
* Create a new GO term: protease activity involved in ecodomain proteolysis (equivalent to: protease activity that part_of some ‘ecodomain proteolysis’). Then use has_substrate:MICA in C16.
* 3. annotate to ADAM10 to a new term requested through term genie (MF involved in BP): protease activity involved in membrane protein ecodomain proteolysis. Then use has_substrate:MICA in C16.
=> OWL: (protease activity that part_of some ‘membrane protein ecodomain proteolysis’) and (has_substrate some MICA))
 
'''Notes:'''
 
Option 2 vs 3 => subtly different OWL (syntax) translations, but these are semantically equivalent, so folding will be the same (DOS has tested).


Option 1 has apparent redundancy in that MICA is mentioned twice, once as a direct input for the process and a second time as a substrate for the protease activity.  However, there is not prospect for inferring this.  has_direct_input is actually an unsafe implication as an input to a part of a process can be an intermediate.  has_participant ('''a relationship not available to curators''') MICA would be entailed for the process if there was a has_part relationship between the proteolysis term and protease activity.  But there is currently no plan to add has_part relationships to enable this.  So, this apparent redundancy is justified.


===<font color="navy">has_input and response_to</font color>===
===<font color="navy">has_input and 'response to'</font color>===


* Question: Can you use ‘has_input’ with ‘response to x’, recording what ‘x’ is in C16?
* '''Question''': Can you use ‘has_input’ with ‘response to x’, recording what ‘x’ is in C16?
* Discussion: Ruth’s example is ‘proteolysis [involved] in cellular response to drug. In this example, you have two has_input relationships:
* '''Discussion''': Ruth’s example is ‘proteolysis [involved] in cellular response to drug. In this example, you have two has_input relationships:
** has_input: drug
** has_input: drug
** has_input: proteolysis target.  
** has_input: proteolysis target.  
* The has_inputs work in the individual cases but when combined, how do you know which input is which?
* The has_inputs work in the individual cases but when combined, how do you know which input is which?
* David OS: Best way is to see if we can express this in OWL. In this case: proteolysis that is part of some cellular response to drug
* The drug isn’t an input to the proteolysis. The proteolysis is part of the cellular response to drug.
* The drug isn’t an input to the proteolysis. The proteolysis is part of the cellular response to drug.
* CONCLUSION: Therefore, You can’t put the drug in the annotation extension for the term ‘proteolysis involved in cellular response to drug’ because the proteolysis is part_of the cellular response. You would use has_input: protein in the combined term. You would need to make an additional annotation to the generic ‘cellular response to drug’ term, using has_input:drug. It’s not ideal because you’ve lost the link that the proteolysis is occurring in response to drug x.
* '''Conclusion''': You can’t put the drug in the annotation extension for the term ‘proteolysis involved in cellular response to drug’ because the proteolysis is part_of the cellular response. You would use has_input: protein in the combined term. You would need to make an additional annotation to the generic ‘cellular response to drug’ term, using has_input:drug. It’s not ideal because you’ve lost the link that the proteolysis is occurring in response to drug x.
 
NB: Decided that if we change the is_a ‘response to x’ to part_of relationships, then we can use the GO term part_of ‘response to x’ in the extension. (see AI below for Editors).


NB: Decided that if we change the is_a ‘response to x’ to part_of relationships, then we can use the GO term part_of ‘response to x’ in the extension. (see AI for Editors).




==<font color="navy">Transcription Factors</font color>==
==<font color="navy">Transcription Factors</font color>==


* Background: At the moment, it’s confusing what relation to use with transcription factors. Their compound-nature is causing issues because there’s two different functions rolled into one term. E.g. for ‘protein-binding transcription factor activity’,  does has_input mean the DOWNSTREAM gene that is being regulated or the protein that is being bound. Should you use ‘has_regulation_target’ for TF annotations.
* '''Background''': Transcription factors may need to be handled in a specific way. At the moment, it’s confusing what relation to use with transcription factors. Their compound-nature is causing issues because there’s two different functions rolled into one term. E.g. for ‘protein-binding transcription factor activity’,  does has_input mean the DOWNSTREAM gene that is being regulated or the PROTEIN that is being bound. Should you use ‘has_regulation_target’ for TF annotations?
* Discussion: Val treats DNA-binding and protein-binding TF differently. PomBase allows ‘has_regulation_target’ to record the gene targets for sequence-specific TFs.  For protein-binding TFs, PomBase capture the gene targets in the BP terms but NOT the TF terms, with the logic that the protein-binding TFs aren’t directly binding to the promoter.


For protein-binding TFs, put the IDs of the genes on the PROCESS term.
* '''Discussion''': Currently, PomBase treats DNA-binding and protein-binding TFs differently. PomBase allows ‘has_regulation_target’ to record the gene targets for sequence-specific TFs.  For protein-binding TFs, PomBase capture the gene targets in the BP terms but NOT the TF terms, with the logic that the protein-binding TFs aren’t directly binding to the promoter.
 
One option is to use more specific relations:


One option is to use more specific relations:
  DNA binding TF involved_in regulation of transcription from Pol II
  DNA binding TF involved_in regulation of transcription from Pol II
C16: has_regulation_target: some <gene>
C16: has_regulation_target: some <gene>
Line 80: Line 104:
C16: has_binding_target some <protein IDt>
C16: has_binding_target some <protein IDt>


For the process terms, use ‘has_regulation_target’. David O-S wants to see these written out in OWL.
For the process terms, you could use ‘has_regulation_target’. David O-S wants to see these written out in OWL.


'''Conclusion''': This isn't yet resolved. To remove the problem that the TF terms don't have an is_a ancestor to 'binding', Val would prefer that the TF terms are revised to 'x binding involved in regulation of transcription from pol II promoter' etc. See AI below.




==<font color="navy">HAS_REGULATION_TARGET vs HAS_TARGET</font color>==
==<font color="navy">LOCALIZATION_DEPENDENT_ON</font color>==


Why do we need ‘has_regulation_target’ when we have regulation in the GO term. Why not just C16: has_target:x
This is suitable for BP annotations where A is localizing B, but it shouldn't be used for CC annotations. See action item for Val to check her existing CC annotations that use the 'localization_dependent_on' C16 relation.


This needs some further discussion as this relations is currently only allowed when annotating to CC. We will also need to discuss in_presence_of and dependent_on at the same time.


==<font color="navy">OCCURS AT</font color>==
==<font color="navy">OCCURS AT</font color>==
Line 93: Line 119:
* Often redundant with occurs_in, especially for GO CC
* Often redundant with occurs_in, especially for GO CC
* Generally used at cell surface, sequence regions (e.g. with SO identifiers)
* Generally used at cell surface, sequence regions (e.g. with SO identifiers)
* Could merge occurs_at and occurs_in into one relation: occurs_at_or_within_location,  
* We discussed merging occurs_at and occurs_in into one relation: occurs_at_or_within_location. But we decided against this because 'occurs_in' is a relation used in GO at the moment, so it seems wrong to make it less specific.
* BP or MF
* BP or MF


Action Items:
'''Conclusion'''. We'll use occurs_in and occurs_at in the following ways, and redefine the relationships:
*OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC). NB, because the definition of membrane includes the intrinsic and extrinsic components, you would use ‘occurs_in’ for membrane annotations
 
*OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)
*'''OCCURS_IN''': All the parts of the process is contained within (CL, UBERON, GO-CC). NB, because the definition of membrane includes the intrinsic and extrinsic components, you would use ‘occurs_in’ for membrane annotations
*'''OCCURS_AT:''' Adjacent to or in the vicinity of. (SO or GO-CC)




Line 104: Line 131:
==<font color="navy">HAS_OUTPUT</font color>==
==<font color="navy">HAS_OUTPUT</font color>==


* For now, we are going to restrict this to BP only. If you find you need to use this for MF, bring up your example or make a new GO term. Add this to the rule file.
* For now, we are going to restrict has_output to BP only. If you find you need to use this for MF, bring up your example with Rachael and Rama, or request a new GO term.  


* Discussion: Should it be used for MF? In some cases, the has_output would be what was assayed in the reaction.  
* Discussion: Should it be used for MF? In some cases, the has_output would be what was assayed in the reaction.  
Line 115: Line 142:


* Can use PRO feature chain ID if you can, to be more specific.
* Can use PRO feature chain ID if you can, to be more specific.




==<font color="navy">DURING, HAPPENS_DURING AND EXISTS_DURING</font color>==
==<font color="navy">DURING, HAPPENS_DURING AND EXISTS_DURING</font color>==


DURING
The current tree stands at:
—EXISTS DURING (CC terms including (but not restricted to) protein complexes)
 
—HAPPENS DURING (MF and BP)
DURING
—EXISTS DURING (CC terms including (but not restricted to) protein complexes)
—HAPPENS DURING (MF and BP)
 
* AI: Don't use the grouping term ‘during’ in annotation extensions, and just use the more specific terms. David OS will look at removing the 'during' relationship completely because of issues with its definition.


* AI: Get rid of grouping term, ‘during’.


===<font color="navy">HAPPENS_DURING</font color>===
===<font color="navy">happens_during</font color>===


* For phase terms, you HAVE to use ‘happens_during’ (not part_of).
* Make a new rule: for phase terms, you '''HAVE''' to use ‘happens_during’ (not part_of).
* For other GO process terms, use happens_during if you don’t know if it contributes to the process.
* For other GO process terms, use happens_during if you don’t know if it contributes to the process.


AI: Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.
AI: Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.


===<font color="navy">exists_during</font color>===


===<font color="navy">EXISTS_DURING</font color>===
NB: Some of the existing ‘during’ C16 relations in Protein2GO at the moment look slightly odd. Need to relook at these.
 
NB: Some of the existing ‘during’ C16 relations in Protein2GO at the moment look slightly odd.
 




Line 143: Line 172:


Everyone agrees !!! :o
Everyone agrees !!! :o
almost .....


* The extracellular terms need a bit of work because there’s some annotations at the moment to ‘extracellular matrix’ part_of x_cell. Where logically you can’t have an extracellular space that is part of a cell.
* The extracellular terms need a bit of work because there’s some annotations at the moment to ‘extracellular matrix’ part_of x_cell. Where logically you can’t have an extracellular space that is part of a cell.
Line 150: Line 181:




==<font color="navy">HAS_REGULATION_TARGET</font color>==
==<font color="navy">HAS_REGULATION_TARGET (UNFINISHED)</font color>==
 
This relation ties into the transcription factor terms.
 
Useful to have distinction between direct and indirect. So can we just use 'has_indirect_target'? Chris prefers 'has_regulation_target' because it's more specific.
 
One option:
* DNA-binding TF activity: has_regulation_target: some gene
* DNA-binding TF activity: has_input/has_substrate: some DNA (SO ID, which is specific for the motif)
 
... BUT DNA-binding TF activity doesn't have is_a DNA binding as a parentage, so it's wrong to say has_input:DNA for this term. This comes back to Val's suggestions for the TF terms, to change to: DNA binding involved in negative regulation of transcription....
 
Some of the issues here are because it seems redundant to have a regulation GO term with 'regulation' in the annotation extension relationship.
 
It was agreed that we should continue to use the relationship has_regulation_target when extending 'regulation of BP' GO terms. However it was felt that extension of the MF GO terms such as endopeptidase inhibitor activity should use the relationship 'has_direct_input' as the protein identified included in the annotation extension should be known to bind the protein annotated as an inhibitor.
 
Also an example was identified where the annotation extension was inappropriate: negative regulation of intrinsic apoptotic signaling pathway, this identified that has_regulation_target should not be used to specify a downstream process regulated by a signaling pathway. Possibly instead should use 'causally_upstream_of'.
In addition it was agreed that a multistep process such as 'negative regulation of intrinsic apoptotic signaling pathway' should not specify a protein with has_input.




Line 156: Line 204:
==<font color="navy">OTHER DISCUSSION</font color>==
==<font color="navy">OTHER DISCUSSION</font color>==


Release the following relations to annotators to use
* Encourage curators who are new to annotation extensions to start with the following relations;
part_of
** part_of
occurs_in
** occurs_in
 


==<font color="navy">PROPOSED ACTION ITEMS</font color>==


==<font color="navy">ACTION ITEMS</font color>==
1. <font color="red">EDITORS</font color>: Change children of ‘response to x’ from is_a to part_of, throughout GO.


1. EDITORS: Change children of ‘response to x’ from is_a to part_of, throughout GO.
2. <font color="red">ANNOTATORS</font color> (Rachael and Val): check cellular component annotations that have ‘localization_dependent_on’ in C16. These are wrong. ‘localization_dependent_on’ makes sense for BP terms when A is controlling the localisation of protein B. But it doesn’t make sense for CC. Consider changing to ‘in_presence_of’.


2. ANNOTATORS (Rachael and Val): check cellular component annotations that have ‘localization_dependent_on’ in C16. These are wrong. ‘localization_dependent_on’ makes sense for BP terms when A is controlling the localisation of protein B. But it doesn’t make sense for CC. Consider changing to ‘in_presence_of’.
3. <font color="red">ANNOTATORS</font color>: Decide if we want to see the hierarchy when we’re choosing our annotation extension in Protein2GO.


3. ANNOTATORS: Decide if we want to see the hierarchy when we’re choosing our annotation extension in Protein2GO.
4. <font color="red">DAVID OS</font color>. Create new more specific has_input relations for: has_substrate, has_transport_target (transports), has_binding_target (binds).


4. DAVID O-S. Create new more specific has_input relations for: has_substrate, has_transport_target (transports), has_binding_target (binds).
5. <font color="red"> RACHAEL </font color>Change definition of ‘has_input’ to allow for its use with ‘cellular response’ terms? Currently is says ‘bound, transported, modified, consumed or destroyed’…. <font color="red">DONE</font color> 4/7/2014


5. Change definition of ‘has_input’ to allow for its use with ‘cellular response’ terms? Currently is says ‘bound, transported, modified, consumed or destroyed’….  
6. <font color="red">VAL AND EDITORS/DAVID HILL</font color>: look at the transcription terms. Val would like a term ‘DNA binding involved in negative regulation of transcription from RNA pol II promoter’, etc. This term would be is_a DNA binding. The advantage of this term is that you wouldn’t need to make two annotations: one for the binding, and one for the TF activity, because the term would be is_a binding. And doesn’t squeeze a process into a function term (so much). Val to submit a SourceForge item as a placeholder.


6. VAL and EDITORS (David Hill): look at the transcription terms. Val would like a term ‘DNA binding involved in negative regulation of transcription from RNA pol II promoter’, etc. This term would be is_a DNA binding. The advantage of this term is that you wouldn’t need to make two annotations: one for the binding, and one for the TF activity, because the term would be is_a binding. And doesn’t squeeze a process into a function term (so much). Val to submit a SourceForge item as a placeholder.
7. <font color="red">DAVID OS</font color>: Write out the transcription factor suggestions in OWL, to check they make sense.


7. David OS: Write out the transcription factor suggestions in OWL, to check they make sense.
8. <font color="red">Rachael</font color> Better define the rules for OCCURS_AT and OCCURS_IN (see above)
* OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC)
* OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC) <font color="red">DONE</font color> 4/7/2014


8. Write the rules for OCCURS_AT and OCCURS_IN
9. <font color="red">RUTH</font color>: Update example for BP HAS_OUTPUT:x. Could use ‘fibroblast growth factor production ; GO:0090269’ with one of the FGF1-10 IDs. <font color="red">DONE</font color> 21/07/2014
OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC)
OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)


9. RUTH: Update example for BP HAS_OUTPUT:x. Could use ‘fibroblast growth factor production ; GO:0090269’ with one of the FGF1-10 IDs.
10. <font color="red">RACHAEL</font color>: Edit the annotation extension file to make rule that has_output can (for the moment) only be used for GO-BP annotations. Enforce this rule in Protein2GO, and add to rule file for curators not using Protein2GO. <font color="red">NOT YET DONE; discussion on subsequent annotation call (http://wiki.geneontology.org/index.php/Annotation_Conf._Call,_June_24,_2014) disagreed with this</font color> 11/9/2014


10. RACHAEL: Edit the annotation extension file to make rule that has_output can (for the moment) only be used for GO-BP annotations. Enforce this rule in Protein2GO, and add to rule file for curators not using Protein2GO.  
11.<font color="red">PASCALE AND VAL</font color>: Look to see if we can add a restriction that has_output can be used with ‘x production’ terms only, for now. We can broaden/change if necessary. E.g. cytokine production + cell adhesion molecule production.


11. Pascale and Val: Look to see if we can add a restriction that has_output can be used with ‘x production’ terms only, for now. We can broaden/change if necessary. E.g. cytokine production + cell adhesion molecule production.
12: <font color="red">RACHAEL</font color>? Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship. <font color="red">NOT DONE</font color> - this is not currently possible (4/7/2014)


12: Rachel? Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.
13. <font color="red">RACHAEL</font color>: Alter local_range for HAPPENS_DURING and EXISTS_DURING to remove GO-MF information. <font color="red">DONE</font color> 4/7/2014


13. Rachael: Alter local_range for HAPPENS_DURING and EXISTS_DURING to remove GO-MF information.  
14. <font color="red">RUTH</font color> Ruth to alter wiki for HAPPENS_DURING so GO-BP but not GO-MF can’t be used in C16. Comment added wrt BPs. I think this item should read 'so GO-BP can be used in C16; and GO-MF can’t be used in C16'. I haven't tried to address the item as it stands, <font color="red">waiting for confirmation</font color> that the wiki updates are sufficient. <font color="red">DONE</font color> 21/07/2014


14. Ruth to alter wiki for HAPPENS_DURING so GO-BP but not GO-MF can’t be used in C16.
15. <font color="red">DAVID OS</font color>: remove ‘during’ from relationships, because it can't be properly defined. It's children 'exists_during' and 'happens_during' will remain. <font color="red">DONE</font color> 4/7/2014


15. David OS: remove ‘during’ from relationships.
16. <font color="red">VAL and RACHAEL</font color> Look at exists_during relation uses to see if they make sense. <font color="red">Need to confirm if the usage of exists_during as currently defined, i.e. CC exists_during process/phase, is appropriate</font color> 11/9/2014


16. Ruth: In the part_of example on the wiki page, make it clear in a footnote or something, that Wnt-activated R activity is already part_of Wnt signaling pathway. Here we’re making a more specific statement that the Wnt-activated R activity is part_of a CANONICAL Wnt signaling pathway.
17. <font color="red">RUTH</font color>: In the part_of example on the wiki page, make it clear in a footnote or something, that Wnt-activated R activity is already part_of Wnt signaling pathway. Here we’re making a more specific statement that the Wnt-activated R activity is part_of a CANONICAL Wnt signaling pathway. <font color="red">DONE</font color> 21/07/2014


17. David OS: make a new relation: ‘adjacent_to’ to describe extracellular regions that are next to a cell.
18. <font color="red">DAVID OS</font color>: make a new relation: ‘adjacent_to’ to describe extracellular regions that are next to a cell.


19. <font color="red">CHRIS/TONY</font color>: should think more about how the on-the-fly human-readable display of folded annotations would work in practice and collaborate with Tony to see if it can be implemented in Protein2GO.


20. <font color="red">DAVID OS</font color>: add OWL statements for the 2 HAPPENS_DURING examples and also add the parents for the new folded GO term: canonical Wnt signaling pathway during limb morphogenesis on this page.


[[Category:Annotation]]
[[Category:Annotation]]

Latest revision as of 13:11, 10 October 2014

Annotation Extension Meeting

  • Date: June 16th 2014
  • Start Time: 9:30
  • Where: EBI-Duxford room

Present:

  • Jane Lomax
  • David O-S
  • Ruth Lovering
  • Rebecca Foulger
  • Pascale Gaudet
  • Aleks (am)
  • Rachael Huntley
  • Valerie Wood
  • Chris Mungall (pm)

Note on Unfolding/Folding

Chris wants to put a mechanism in place that will display a human-readable version of the folded annotation at the point of annotation, so that the curator can immediately determine whether the annotation makes sense.

AI: Chris should think more about how this would work in practice and collaborate with Tony to see if it can be implemented in Protein2GO.

Proposal for new relations

  • Proposal: We should add more specific terms under has_participant, 'has_input’ and ‘has_direct_input’:
has_input
--has_direct_input
----binds
----has_substrate
----transports
 has_regulation_target
----has_direct_regulation_target
 has_participant (not used in annotation extensions)
----has_agent (not useful for annotation extensions)
----has_direct_input
--------binds
--------has_substrate
--------transports
----has_output
--------transports 
--------has_product

(draft hierarchy from DOS - some work needed to RO to bring it into line with this)

These should all GO into RO. Please post tickets requesting new terms to [RO tracker](https://code.google.com/p/obo-relations/issues/list). We also need to develop methods to keep RO in sync with relations used in AE.


HAS_INPUT

  • Rule: Use for MF and BP only.


ADAM10 case study for has_input

Background: [How to annotate ADAM10, which acts as a protease on MICA to cleave in the membrane ecodomain.](http://wiki.geneontology.org/index.php/Annotation_Extension_Relation:has_input#Using_examples_.28from_above.29_to_demonstrate_Folding_and_Unfolding_using_the_relationship_has_input) Note in the example has_input was used as direct binding of ADAM10 to MICA is not shown. The examples below would be applied only if direct binding between the enzyme and it's substrate were demonstrated.

  • Options:
  • 1. annotate ADAM10 to ‘membrane protein ecodomain proteolysis’ (C16: has_direct_input: MICA.) => OWL: membrane protein ecodomain proteolysis’ and has_direct_input some MICA

+ annotate ADAM10 to ‘protease activity’ (C16: has_substrate: MICA, part_of membrane protein ecodomain proteolysis). => OWL: protease activity and (part_of some ‘ecodomain proteolysis’) and (has_substrate some MICA)

OR

  • 2. annotate ADAM10 to ‘protease activity’ (C16:part_of membrane protein ecodomain proteolysis, has_substrate MICA). => OWL: protease activity and (part_of some ‘ecodomain proteolysis’) and (has_substrate some MICA)

OR:

  • 3. annotate to ADAM10 to a new term requested through term genie (MF involved in BP): protease activity involved in membrane protein ecodomain proteolysis. Then use has_substrate:MICA in C16.

=> OWL: (protease activity that part_of some ‘membrane protein ecodomain proteolysis’) and (has_substrate some MICA))

Notes:

Option 2 vs 3 => subtly different OWL (syntax) translations, but these are semantically equivalent, so folding will be the same (DOS has tested).

Option 1 has apparent redundancy in that MICA is mentioned twice, once as a direct input for the process and a second time as a substrate for the protease activity. However, there is not prospect for inferring this. has_direct_input is actually an unsafe implication as an input to a part of a process can be an intermediate. has_participant (a relationship not available to curators) MICA would be entailed for the process if there was a has_part relationship between the proteolysis term and protease activity. But there is currently no plan to add has_part relationships to enable this. So, this apparent redundancy is justified.

has_input and 'response to'

  • Question: Can you use ‘has_input’ with ‘response to x’, recording what ‘x’ is in C16?
  • Discussion: Ruth’s example is ‘proteolysis [involved] in cellular response to drug. In this example, you have two has_input relationships:
    • has_input: drug
    • has_input: proteolysis target.
  • The has_inputs work in the individual cases but when combined, how do you know which input is which?
  • The drug isn’t an input to the proteolysis. The proteolysis is part of the cellular response to drug.
  • Conclusion: You can’t put the drug in the annotation extension for the term ‘proteolysis involved in cellular response to drug’ because the proteolysis is part_of the cellular response. You would use has_input: protein in the combined term. You would need to make an additional annotation to the generic ‘cellular response to drug’ term, using has_input:drug. It’s not ideal because you’ve lost the link that the proteolysis is occurring in response to drug x.

NB: Decided that if we change the is_a ‘response to x’ to part_of relationships, then we can use the GO term part_of ‘response to x’ in the extension. (see AI below for Editors).


Transcription Factors

  • Background: Transcription factors may need to be handled in a specific way. At the moment, it’s confusing what relation to use with transcription factors. Their compound-nature is causing issues because there’s two different functions rolled into one term. E.g. for ‘protein-binding transcription factor activity’, does has_input mean the DOWNSTREAM gene that is being regulated or the PROTEIN that is being bound. Should you use ‘has_regulation_target’ for TF annotations?
  • Discussion: Currently, PomBase treats DNA-binding and protein-binding TFs differently. PomBase allows ‘has_regulation_target’ to record the gene targets for sequence-specific TFs. For protein-binding TFs, PomBase capture the gene targets in the BP terms but NOT the TF terms, with the logic that the protein-binding TFs aren’t directly binding to the promoter.

One option is to use more specific relations:

DNA binding TF involved_in regulation of transcription from Pol II

C16: has_regulation_target: some <gene> C16: has_binding_target some <SO sequence element>

protein binding TF involved_in regulation of transcription from Pol II

C16: has_regulation_target: some <gene> C16: has_binding_target some <protein IDt>

For the process terms, you could use ‘has_regulation_target’. David O-S wants to see these written out in OWL.

Conclusion: This isn't yet resolved. To remove the problem that the TF terms don't have an is_a ancestor to 'binding', Val would prefer that the TF terms are revised to 'x binding involved in regulation of transcription from pol II promoter' etc. See AI below.


LOCALIZATION_DEPENDENT_ON

This is suitable for BP annotations where A is localizing B, but it shouldn't be used for CC annotations. See action item for Val to check her existing CC annotations that use the 'localization_dependent_on' C16 relation.

This needs some further discussion as this relations is currently only allowed when annotating to CC. We will also need to discuss in_presence_of and dependent_on at the same time.

OCCURS AT

  • Often redundant with occurs_in, especially for GO CC
  • Generally used at cell surface, sequence regions (e.g. with SO identifiers)
  • We discussed merging occurs_at and occurs_in into one relation: occurs_at_or_within_location. But we decided against this because 'occurs_in' is a relation used in GO at the moment, so it seems wrong to make it less specific.
  • BP or MF

Conclusion. We'll use occurs_in and occurs_at in the following ways, and redefine the relationships:

  • OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC). NB, because the definition of membrane includes the intrinsic and extrinsic components, you would use ‘occurs_in’ for membrane annotations
  • OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)


HAS_OUTPUT

  • For now, we are going to restrict has_output to BP only. If you find you need to use this for MF, bring up your example with Rachael and Rama, or request a new GO term.
  • Discussion: Should it be used for MF? In some cases, the has_output would be what was assayed in the reaction.
  • AI: Could suggest a restriction for use with ‘cytokine production’ terms only.
  • We need a better example, ideally where the paper shows stronger evidence for a catalytic activity. And where a catalytic activity can create >1 choice of output. For the current prostaglandin-I synthase activity example, the term definition is: Catalysis of the reaction: prostaglandin H(2) = prostaglandin I(2). Therefore the enzyme will always produce prostaglandin I2, and no extension is needed.
  • Can use UniProt/Protein ID in C16 with this extension. Not a PRO ID, because for any given species, you don’t need the generic identifier because you’ll know the species-specific (UniProt) one.
  • Can use PRO feature chain ID if you can, to be more specific.


DURING, HAPPENS_DURING AND EXISTS_DURING

The current tree stands at:

DURING
—EXISTS DURING (CC terms including (but not restricted to) protein complexes)
—HAPPENS DURING (MF and BP)
  • AI: Don't use the grouping term ‘during’ in annotation extensions, and just use the more specific terms. David OS will look at removing the 'during' relationship completely because of issues with its definition.


happens_during

  • Make a new rule: for phase terms, you HAVE to use ‘happens_during’ (not part_of).
  • For other GO process terms, use happens_during if you don’t know if it contributes to the process.

AI: Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.

exists_during

NB: Some of the existing ‘during’ C16 relations in Protein2GO at the moment look slightly odd. Need to relook at these.


PART_OF

Everyone agrees !!! :o

almost .....

  • The extracellular terms need a bit of work because there’s some annotations at the moment to ‘extracellular matrix’ part_of x_cell. Where logically you can’t have an extracellular space that is part of a cell.
  • Allow use of the RO relation ‘adjacent_to’ for annotation extensions for CC extracellular annotations. When this is done, MGI will need to relook at their ‘extracellular space’ part_of ‘x-cell’ annotations.


HAS_REGULATION_TARGET (UNFINISHED)

This relation ties into the transcription factor terms.

Useful to have distinction between direct and indirect. So can we just use 'has_indirect_target'? Chris prefers 'has_regulation_target' because it's more specific.

One option:

  • DNA-binding TF activity: has_regulation_target: some gene
  • DNA-binding TF activity: has_input/has_substrate: some DNA (SO ID, which is specific for the motif)

... BUT DNA-binding TF activity doesn't have is_a DNA binding as a parentage, so it's wrong to say has_input:DNA for this term. This comes back to Val's suggestions for the TF terms, to change to: DNA binding involved in negative regulation of transcription....

Some of the issues here are because it seems redundant to have a regulation GO term with 'regulation' in the annotation extension relationship.

It was agreed that we should continue to use the relationship has_regulation_target when extending 'regulation of BP' GO terms. However it was felt that extension of the MF GO terms such as endopeptidase inhibitor activity should use the relationship 'has_direct_input' as the protein identified included in the annotation extension should be known to bind the protein annotated as an inhibitor.

Also an example was identified where the annotation extension was inappropriate: negative regulation of intrinsic apoptotic signaling pathway, this identified that has_regulation_target should not be used to specify a downstream process regulated by a signaling pathway. Possibly instead should use 'causally_upstream_of'. In addition it was agreed that a multistep process such as 'negative regulation of intrinsic apoptotic signaling pathway' should not specify a protein with has_input.


OTHER DISCUSSION

  • Encourage curators who are new to annotation extensions to start with the following relations;
    • part_of
    • occurs_in

PROPOSED ACTION ITEMS

1. EDITORS: Change children of ‘response to x’ from is_a to part_of, throughout GO.

2. ANNOTATORS (Rachael and Val): check cellular component annotations that have ‘localization_dependent_on’ in C16. These are wrong. ‘localization_dependent_on’ makes sense for BP terms when A is controlling the localisation of protein B. But it doesn’t make sense for CC. Consider changing to ‘in_presence_of’.

3. ANNOTATORS: Decide if we want to see the hierarchy when we’re choosing our annotation extension in Protein2GO.

4. DAVID OS. Create new more specific has_input relations for: has_substrate, has_transport_target (transports), has_binding_target (binds).

5. RACHAEL Change definition of ‘has_input’ to allow for its use with ‘cellular response’ terms? Currently is says ‘bound, transported, modified, consumed or destroyed’…. DONE 4/7/2014

6. VAL AND EDITORS/DAVID HILL: look at the transcription terms. Val would like a term ‘DNA binding involved in negative regulation of transcription from RNA pol II promoter’, etc. This term would be is_a DNA binding. The advantage of this term is that you wouldn’t need to make two annotations: one for the binding, and one for the TF activity, because the term would be is_a binding. And doesn’t squeeze a process into a function term (so much). Val to submit a SourceForge item as a placeholder.

7. DAVID OS: Write out the transcription factor suggestions in OWL, to check they make sense.

8. Rachael Better define the rules for OCCURS_AT and OCCURS_IN (see above)

  • OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC)
  • OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC) DONE 4/7/2014

9. RUTH: Update example for BP HAS_OUTPUT:x. Could use ‘fibroblast growth factor production ; GO:0090269’ with one of the FGF1-10 IDs. DONE 21/07/2014

10. RACHAEL: Edit the annotation extension file to make rule that has_output can (for the moment) only be used for GO-BP annotations. Enforce this rule in Protein2GO, and add to rule file for curators not using Protein2GO. NOT YET DONE; discussion on subsequent annotation call (http://wiki.geneontology.org/index.php/Annotation_Conf._Call,_June_24,_2014) disagreed with this 11/9/2014

11.PASCALE AND VAL: Look to see if we can add a restriction that has_output can be used with ‘x production’ terms only, for now. We can broaden/change if necessary. E.g. cytokine production + cell adhesion molecule production.

12: RACHAEL? Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship. NOT DONE - this is not currently possible (4/7/2014)

13. RACHAEL: Alter local_range for HAPPENS_DURING and EXISTS_DURING to remove GO-MF information. DONE 4/7/2014

14. RUTH Ruth to alter wiki for HAPPENS_DURING so GO-BP but not GO-MF can’t be used in C16. Comment added wrt BPs. I think this item should read 'so GO-BP can be used in C16; and GO-MF can’t be used in C16'. I haven't tried to address the item as it stands, waiting for confirmation that the wiki updates are sufficient. DONE 21/07/2014

15. DAVID OS: remove ‘during’ from relationships, because it can't be properly defined. It's children 'exists_during' and 'happens_during' will remain. DONE 4/7/2014

16. VAL and RACHAEL Look at exists_during relation uses to see if they make sense. Need to confirm if the usage of exists_during as currently defined, i.e. CC exists_during process/phase, is appropriate 11/9/2014

17. RUTH: In the part_of example on the wiki page, make it clear in a footnote or something, that Wnt-activated R activity is already part_of Wnt signaling pathway. Here we’re making a more specific statement that the Wnt-activated R activity is part_of a CANONICAL Wnt signaling pathway. DONE 21/07/2014

18. DAVID OS: make a new relation: ‘adjacent_to’ to describe extracellular regions that are next to a cell.

19. CHRIS/TONY: should think more about how the on-the-fly human-readable display of folded annotations would work in practice and collaborate with Tony to see if it can be implemented in Protein2GO.

20. DAVID OS: add OWL statements for the 2 HAPPENS_DURING examples and also add the parents for the new folded GO term: canonical Wnt signaling pathway during limb morphogenesis on this page.