Annotation Extension meeting 2014-06-16

From GO Wiki
Revision as of 06:46, 16 June 2014 by Rfoulger (talk | contribs) (ACTION ITEMS)

Jump to: navigation, search

Annotation Extension Meeting June 16th 2014 EBI-Duxford room


  • Jane Lomax
  • David O-S
  • Ruth Lovering
  • Rebecca Foulger
  • Pascale Gaudet
  • Aleks (am only)
  • Rachael Huntley
  • Valerie Wood
  • Chris Mungall


  • Use for MF and BP only.
  • Discussion: We should have more specific terms under ‘has_input’ and ‘has_direct_input’:

Case study for has_input

  • Discussion: How do you annotate ADAM10, which acts as a protease on MICA to cleave in the membrane ecodomain.
  • in OWL: capable_of some ((‘protease activity’) that has_substrate some mica) and (part_of some ‘ecodomain proteolysis’))
  • protease activity that part_of some ‘ecodomain proteolysis’ and (has_substrate some mica).
  • annotate ADAM10 to ‘membrane protein ecodomain proteolysis’ has_direct_input: MICA.
  • annotate ADAM10 to ‘protease activity’ has_substrate: MICA.


  • annotate ADAM10 to ‘protease activity’ (C16:part_of: ecodomain proteolysis).


  • Create a new GO term: protease activity involved in ecodomain proteolysis (equivalent to: protease activity that part_of some ‘ecodomain proteolysis’). Then use has_substrate:MICA in C16.

has_input and response_to

  • Question: Can you use ‘has_input’ with ‘response to x’, recording what ‘x’ is in C16?
  • Discussion: Ruth’s example is ‘proteolysis [involved] in cellular response to drug. In this example, you have two has_input relationships:
    • has_input: drug
    • has_input: proteolysis target.
  • The has_inputs work in the individual cases but when combined, how do you know which input is which?
  • David OS: Best way is to see if we can express this in OWL. In this case: proteolysis that is part of some cellular response to drug
  • The drug isn’t an input to the proteolysis. The proteolysis is part of the cellular response to drug.
  • CONCLUSION: Therefore, You can’t put the drug in the annotation extension for the term ‘proteolysis involved in cellular response to drug’ because the proteolysis is part_of the cellular response. You would use has_input: protein in the combined term. You would need to make an additional annotation to the generic ‘cellular response to drug’ term, using has_input:drug. It’s not ideal because you’ve lost the link that the proteolysis is occurring in response to drug x.

NB: Decided that if we change the is_a ‘response to x’ to part_of relationships, then we can use the GO term part_of ‘response to x’ in the extension. (see AI for Editors).

Transcription Factors

  • Background: At the moment, it’s confusing what relation to use with transcription factors. Their compound-nature is causing issues because there’s two different functions rolled into one term. E.g. for ‘protein-binding transcription factor activity’, does has_input mean the DOWNSTREAM gene that is being regulated or the protein that is being bound. Should you use ‘has_regulation_target’ for TF annotations.
  • Discussion: Val treats DNA-binding and protein-binding TF differently. PomBase allows ‘has_regulation_target’ to record the gene targets for sequence-specific TFs. For protein-binding TFs, PomBase capture the gene targets in the BP terms but NOT the TF terms, with the logic that the protein-binding TFs aren’t directly binding to the promoter.

For protein-binding TFs, put the IDs of the genes on the PROCESS term.

One option is to use more specific relations:
DNA binding TF involved_in regulation of transcription from Pol II

C16: has_regulation_target: some <gene> C16: has_binding_target some <SO sequence element>

protein binding TF involved_in regulation of transcription from Pol II

C16: has_regulation_target: some <gene> C16: has_binding_target some <protein IDt>

For the process terms, use ‘has_regulation_target’. David O-S wants to see these written out in OWL.


Why do we need ‘has_regulation_target’ when we have regulation in the GO term. Why not just C16: has_target:x


  • Often redundant with occurs_in, especially for GO CC
  • Generally used at cell surface, sequence regions (e.g. with SO identifiers)
  • Could merge occurs_at and occurs_in into one relation: occurs_at_or_within_location,
  • BP or MF

Action Items:

  • OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC). NB, because the definition of membrane includes the intrinsic and extrinsic components, you would use ‘occurs_in’ for membrane annotations
  • OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)


  • For now, we are going to restrict this to BP only. If you find you need to use this for MF, bring up your example or make a new GO term. Add this to the rule file.
  • Discussion: Should it be used for MF? In some cases, the has_output would be what was assayed in the reaction.
  • AI: Could suggest a restriction for use with ‘cytokine production’ terms only.
  • We need a better example, ideally where the paper shows stronger evidence for a catalytic activity. And where a catalytic activity can create >1 choice of output. For the current prostaglandin-I synthase activity example, the term definition is: Catalysis of the reaction: prostaglandin H(2) = prostaglandin I(2). Therefore the enzyme will always produce prostaglandin I2, and no extension is needed.
  • Can use UniProt/Protein ID in C16 with this extension. Not a PRO ID, because for any given species, you don’t need the generic identifier because you’ll know the species-specific (UniProt) one.
  • Can use PRO feature chain ID if you can, to be more specific.


DURING —EXISTS DURING (CC terms including (but not restricted to) protein complexes) —HAPPENS DURING (MF and BP)

  • AI: Get rid of grouping term, ‘during’.


  • For phase terms, you HAVE to use ‘happens_during’ (not part_of).
  • For other GO process terms, use happens_during if you don’t know if it contributes to the process.

AI: Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.


NB: Some of the existing ‘during’ C16 relations in Protein2GO at the moment look slightly odd.


Everyone agrees !!! :o

  • The extracellular terms need a bit of work because there’s some annotations at the moment to ‘extracellular matrix’ part_of x_cell. Where logically you can’t have an extracellular space that is part of a cell.
  • Allow use of the RO relation ‘adjacent_to’ for annotation extensions for CC extracellular annotations. When this is done, MGI will need to relook at their ‘extracellular space’ part_of ‘x-cell’ annotations.



  • Release the following relations to annotators to use
    • part_of
    • occurs_in


1. EDITORS: Change children of ‘response to x’ from is_a to part_of, throughout GO.

2. ANNOTATORS (Rachael and Val): check cellular component annotations that have ‘localization_dependent_on’ in C16. These are wrong. ‘localization_dependent_on’ makes sense for BP terms when A is controlling the localisation of protein B. But it doesn’t make sense for CC. Consider changing to ‘in_presence_of’.

3. ANNOTATORS: Decide if we want to see the hierarchy when we’re choosing our annotation extension in Protein2GO.

4. DAVID OS. Create new more specific has_input relations for: has_substrate, has_transport_target (transports), has_binding_target (binds).

5. ?Change definition of ‘has_input’ to allow for its use with ‘cellular response’ terms? Currently is says ‘bound, transported, modified, consumed or destroyed’….

6. VAL AND EDITORS/DAVID HILL: look at the transcription terms. Val would like a term ‘DNA binding involved in negative regulation of transcription from RNA pol II promoter’, etc. This term would be is_a DNA binding. The advantage of this term is that you wouldn’t need to make two annotations: one for the binding, and one for the TF activity, because the term would be is_a binding. And doesn’t squeeze a process into a function term (so much). Val to submit a SourceForge item as a placeholder.

7. DAVID OS: Write out the transcription factor suggestions in OWL, to check they make sense.

8. ?Write the rules for OCCURS_AT and OCCURS_IN

  • OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC)
  • OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)

9. RUTH: Update example for BP HAS_OUTPUT:x. Could use ‘fibroblast growth factor production ; GO:0090269’ with one of the FGF1-10 IDs.

10. RACHAEL: Edit the annotation extension file to make rule that has_output can (for the moment) only be used for GO-BP annotations. Enforce this rule in Protein2GO, and add to rule file for curators not using Protein2GO.

11.PASCALE AND VAL: Look to see if we can add a restriction that has_output can be used with ‘x production’ terms only, for now. We can broaden/change if necessary. E.g. cytokine production + cell adhesion molecule production.

12: RACHAEL? Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.

13. RACHAEL: Alter local_range for HAPPENS_DURING and EXISTS_DURING to remove GO-MF information.

14. RUTH Ruth to alter wiki for HAPPENS_DURING so GO-BP but not GO-MF can’t be used in C16.

15. DAVID OS: remove ‘during’ from relationships.

16. RUTH: In the part_of example on the wiki page, make it clear in a footnote or something, that Wnt-activated R activity is already part_of Wnt signaling pathway. Here we’re making a more specific statement that the Wnt-activated R activity is part_of a CANONICAL Wnt signaling pathway.

17. DAVID OS: make a new relation: ‘adjacent_to’ to describe extracellular regions that are next to a cell.