Annotation Extension meeting 2014-06-16

From GO Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Annotation Extension Meeting June 16th 2014 EBI-Duxford room

Present

  • Jane Lomax
  • David O-S
  • Ruth Lovering
  • Rebecca Foulger
  • Pascale Gaudet
  • Aleks (am only)
  • Rachael Huntley
  • Valerie Wood
  • Chris Mungall


HAS_INPUT

  • Use for MF and BP only.
  • Discussion: We should have more specific terms under ‘has_input’ and ‘has_direct_input’:
has_input
--has_direct_input
----binds
----has_substrate
----transports
----has_direct_regulation_target
 has_regulation_target
----has_direct_regulation_target
 has_participant 
----has_agent
----has_output
----transports 
----has_product

Case study for has_input

  • Discussion: How do you annotate ADAM10, which acts as a protease on MICA to cleave in the membrane ecodomain.
  • in OWL: capable_of some ((‘protease activity’) that has_substrate some mica) and (part_of some ‘ecodomain proteolysis’))
  • protease activity that part_of some ‘ecodomain proteolysis’ and (has_substrate some mica).
  • annotate ADAM10 to ‘membrane protein ecodomain proteolysis’ has_direct_input: MICA.
  • annotate ADAM10 to ‘protease activity’ has_substrate: MICA.

OR

  • annotate ADAM10 to ‘protease activity’ (C16:part_of: ecodomain proteolysis).

OR:

  • Create a new GO term: protease activity involved in ecodomain proteolysis (equivalent to: protease activity that part_of some ‘ecodomain proteolysis’). Then use has_substrate:MICA in C16.


has_input and response_to

  • Question: Can you use ‘has_input’ with ‘response to x’, recording what ‘x’ is in C16?
  • Discussion: Ruth’s example is ‘proteolysis [involved] in cellular response to drug. In this example, you have two has_input relationships:
    • has_input: drug
    • has_input: proteolysis target.
  • The has_inputs work in the individual cases but when combined, how do you know which input is which?
  • David OS: Best way is to see if we can express this in OWL. In this case: proteolysis that is part of some cellular response to drug
  • The drug isn’t an input to the proteolysis. The proteolysis is part of the cellular response to drug.
  • CONCLUSION: Therefore, You can’t put the drug in the annotation extension for the term ‘proteolysis involved in cellular response to drug’ because the proteolysis is part_of the cellular response. You would use has_input: protein in the combined term. You would need to make an additional annotation to the generic ‘cellular response to drug’ term, using has_input:drug. It’s not ideal because you’ve lost the link that the proteolysis is occurring in response to drug x.

NB: Decided that if we change the is_a ‘response to x’ to part_of relationships, then we can use the GO term part_of ‘response to x’ in the extension. (see AI for Editors).


Transcription Factors

  • Background: At the moment, it’s confusing what relation to use with transcription factors. Their compound-nature is causing issues because there’s two different functions rolled into one term. E.g. for ‘protein-binding transcription factor activity’, does has_input mean the DOWNSTREAM gene that is being regulated or the protein that is being bound. Should you use ‘has_regulation_target’ for TF annotations.
  • Discussion: Val treats DNA-binding and protein-binding TF differently. PomBase allows ‘has_regulation_target’ to record the gene targets for sequence-specific TFs. For protein-binding TFs, PomBase capture the gene targets in the BP terms but NOT the TF terms, with the logic that the protein-binding TFs aren’t directly binding to the promoter.

For protein-binding TFs, put the IDs of the genes on the PROCESS term.

One option is to use more specific relations:
DNA binding TF involved_in regulation of transcription from Pol II

C16: has_regulation_target: some <gene> C16: has_binding_target some <SO sequence element>

protein binding TF involved_in regulation of transcription from Pol II

C16: has_regulation_target: some <gene> C16: has_binding_target some <protein IDt>

For the process terms, use ‘has_regulation_target’. David O-S wants to see these written out in OWL.


HAS_REGULATION_TARGET vs HAS_TARGET

Why do we need ‘has_regulation_target’ when we have regulation in the GO term. Why not just C16: has_target:x


OCCURS AT

  • Often redundant with occurs_in, especially for GO CC
  • Generally used at cell surface, sequence regions (e.g. with SO identifiers)
  • Could merge occurs_at and occurs_in into one relation: occurs_at_or_within_location,
  • BP or MF

Action Items:

  • OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC). NB, because the definition of membrane includes the intrinsic and extrinsic components, you would use ‘occurs_in’ for membrane annotations
  • OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)


HAS_OUTPUT

  • For now, we are going to restrict this to BP only. If you find you need to use this for MF, bring up your example or make a new GO term. Add this to the rule file.
  • Discussion: Should it be used for MF? In some cases, the has_output would be what was assayed in the reaction.
  • AI: Could suggest a restriction for use with ‘cytokine production’ terms only.
  • We need a better example, ideally where the paper shows stronger evidence for a catalytic activity. And where a catalytic activity can create >1 choice of output. For the current prostaglandin-I synthase activity example, the term definition is: Catalysis of the reaction: prostaglandin H(2) = prostaglandin I(2). Therefore the enzyme will always produce prostaglandin I2, and no extension is needed.
  • Can use UniProt/Protein ID in C16 with this extension. Not a PRO ID, because for any given species, you don’t need the generic identifier because you’ll know the species-specific (UniProt) one.
  • Can use PRO feature chain ID if you can, to be more specific.


DURING, HAPPENS_DURING AND EXISTS_DURING

DURING —EXISTS DURING (CC terms including (but not restricted to) protein complexes) —HAPPENS DURING (MF and BP)

  • AI: Get rid of grouping term, ‘during’.


happens_during

  • For phase terms, you HAVE to use ‘happens_during’ (not part_of).
  • For other GO process terms, use happens_during if you don’t know if it contributes to the process.

AI: Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.


exists_during

NB: Some of the existing ‘during’ C16 relations in Protein2GO at the moment look slightly odd.

PART_OF

Everyone agrees !!! :o

  • The extracellular terms need a bit of work because there’s some annotations at the moment to ‘extracellular matrix’ part_of x_cell. Where logically you can’t have an extracellular space that is part of a cell.
  • Allow use of the RO relation ‘adjacent_to’ for annotation extensions for CC extracellular annotations. When this is done, MGI will need to relook at their ‘extracellular space’ part_of ‘x-cell’ annotations.


HAS_REGULATION_TARGET

OTHER DISCUSSION

Release the following relations to annotators to use part_of occurs_in


ACTION ITEMS

1. EDITORS: Change children of ‘response to x’ from is_a to part_of, throughout GO.

2. ANNOTATORS (Rachael and Val): check cellular component annotations that have ‘localization_dependent_on’ in C16. These are wrong. ‘localization_dependent_on’ makes sense for BP terms when A is controlling the localisation of protein B. But it doesn’t make sense for CC. Consider changing to ‘in_presence_of’.

3. ANNOTATORS: Decide if we want to see the hierarchy when we’re choosing our annotation extension in Protein2GO.

4. DAVID O-S. Create new more specific has_input relations for: has_substrate, has_transport_target (transports), has_binding_target (binds).

5. Change definition of ‘has_input’ to allow for its use with ‘cellular response’ terms? Currently is says ‘bound, transported, modified, consumed or destroyed’….

6. VAL and EDITORS (David Hill): look at the transcription terms. Val would like a term ‘DNA binding involved in negative regulation of transcription from RNA pol II promoter’, etc. This term would be is_a DNA binding. The advantage of this term is that you wouldn’t need to make two annotations: one for the binding, and one for the TF activity, because the term would be is_a binding. And doesn’t squeeze a process into a function term (so much). Val to submit a SourceForge item as a placeholder.

7. David OS: Write out the transcription factor suggestions in OWL, to check they make sense.

8. Write the rules for OCCURS_AT and OCCURS_IN OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC) OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)

9. RUTH: Update example for BP HAS_OUTPUT:x. Could use ‘fibroblast growth factor production ; GO:0090269’ with one of the FGF1-10 IDs.

10. RACHAEL: Edit the annotation extension file to make rule that has_output can (for the moment) only be used for GO-BP annotations. Enforce this rule in Protein2GO, and add to rule file for curators not using Protein2GO.

11. Pascale and Val: Look to see if we can add a restriction that has_output can be used with ‘x production’ terms only, for now. We can broaden/change if necessary. E.g. cytokine production + cell adhesion molecule production.

12: Rachel? Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.

13. Rachael: Alter local_range for HAPPENS_DURING and EXISTS_DURING to remove GO-MF information.

14. Ruth to alter wiki for HAPPENS_DURING so GO-BP but not GO-MF can’t be used in C16.

15. David OS: remove ‘during’ from relationships.

16. Ruth: In the part_of example on the wiki page, make it clear in a footnote or something, that Wnt-activated R activity is already part_of Wnt signaling pathway. Here we’re making a more specific statement that the Wnt-activated R activity is part_of a CANONICAL Wnt signaling pathway.

17. David OS: make a new relation: ‘adjacent_to’ to describe extracellular regions that are next to a cell.