Difference between revisions of "Annotation Extension meeting 2014-06-16"

From GO Wiki
Jump to: navigation, search
(EXISTS_DURING)
Line 10: Line 10:
 
*Rebecca Foulger
 
*Rebecca Foulger
 
*Pascale Gaudet
 
*Pascale Gaudet
*Aleks (am only)
+
*Aleks (am)
 
*Rachael Huntley
 
*Rachael Huntley
 
*Valerie Wood
 
*Valerie Wood
*Chris Mungall
+
*Chris Mungall (pm)
  
  

Revision as of 00:25, 17 June 2014

Annotation Extension Meeting

  • Date: June 16th 2014
  • Start Time: 9:30
  • Where: EBI-Duxford room

Present:

  • Jane Lomax
  • David O-S
  • Ruth Lovering
  • Rebecca Foulger
  • Pascale Gaudet
  • Aleks (am)
  • Rachael Huntley
  • Valerie Wood
  • Chris Mungall (pm)


HAS_INPUT

  • Rule: Use for MF and BP only.
  • Proposal: We should add more specific terms under ‘has_input’ and ‘has_direct_input’:
has_input
--has_direct_input
----binds
----has_substrate
----transports
----has_direct_regulation_target
 has_regulation_target
----has_direct_regulation_target
 has_participant 
----has_agent (not useful for annotation extensions)
----has_output
----transports 
----has_product


ADAM10 case study for has_input

Background: How to annotate ADAM10, which acts as a protease on MICA to cleave in the membrane ecodomain.

  • in OWL: capable_of some ((‘protease activity’) that has_substrate some mica) and (part_of some ‘ecodomain proteolysis’))
  • Options:
  • 1. annotate ADAM10 to ‘membrane protein ecodomain proteolysis’ has_direct_input: MICA.
  • 1. annotate ADAM10 to ‘protease activity’ has_substrate: MICA.

OR

  • 2. annotate ADAM10 to ‘protease activity’ (C16:part_of: ecodomain proteolysis).

OR:

  • 3. Create a new GO term: protease activity involved in ecodomain proteolysis (equivalent to: protease activity that part_of some ‘ecodomain proteolysis’). Then use has_substrate:MICA in C16.


has_input and 'response to'

  • Question: Can you use ‘has_input’ with ‘response to x’, recording what ‘x’ is in C16?
  • Discussion: Ruth’s example is ‘proteolysis [involved] in cellular response to drug. In this example, you have two has_input relationships:
    • has_input: drug
    • has_input: proteolysis target.
  • The has_inputs work in the individual cases but when combined, how do you know which input is which?
  • The drug isn’t an input to the proteolysis. The proteolysis is part of the cellular response to drug.
  • Conclusion: You can’t put the drug in the annotation extension for the term ‘proteolysis involved in cellular response to drug’ because the proteolysis is part_of the cellular response. You would use has_input: protein in the combined term. You would need to make an additional annotation to the generic ‘cellular response to drug’ term, using has_input:drug. It’s not ideal because you’ve lost the link that the proteolysis is occurring in response to drug x.

NB: Decided that if we change the is_a ‘response to x’ to part_of relationships, then we can use the GO term part_of ‘response to x’ in the extension. (see AI below for Editors).


Transcription Factors

  • Background: Transcription factors may need to be handled in a specific way. At the moment, it’s confusing what relation to use with transcription factors. Their compound-nature is causing issues because there’s two different functions rolled into one term. E.g. for ‘protein-binding transcription factor activity’, does has_input mean the DOWNSTREAM gene that is being regulated or the PROTEIN that is being bound. Should you use ‘has_regulation_target’ for TF annotations?
  • Discussion: Currently, PomBase treats DNA-binding and protein-binding TFs differently. PomBase allows ‘has_regulation_target’ to record the gene targets for sequence-specific TFs. For protein-binding TFs, PomBase capture the gene targets in the BP terms but NOT the TF terms, with the logic that the protein-binding TFs aren’t directly binding to the promoter.

One option is to use more specific relations:

DNA binding TF involved_in regulation of transcription from Pol II

C16: has_regulation_target: some <gene> C16: has_binding_target some <SO sequence element>

protein binding TF involved_in regulation of transcription from Pol II

C16: has_regulation_target: some <gene> C16: has_binding_target some <protein IDt>

For the process terms, you could use ‘has_regulation_target’. David O-S wants to see these written out in OWL.

Conclusion: This isn't yet resolved. To remove the problem that the TF terms don't have an is_a ancestor to 'binding', Val would prefer that the TF terms are revised to 'x binding involved in regulation of transcription from pol II promoter' etc. See AI below.


OCCURS AT

  • Often redundant with occurs_in, especially for GO CC
  • Generally used at cell surface, sequence regions (e.g. with SO identifiers)
  • We discussed merging occurs_at and occurs_in into one relation: occurs_at_or_within_location. But we decided against this because 'occurs_in' is a relation used in GO at the moment, so it seems wrong to make it less specific.
  • BP or MF

Conclusion. We'll use occurs_in and occurs_at in the following ways, and redefine the relationships:

  • OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC). NB, because the definition of membrane includes the intrinsic and extrinsic components, you would use ‘occurs_in’ for membrane annotations
  • OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)


HAS_OUTPUT

  • For now, we are going to restrict has_output to BP only. If you find you need to use this for MF, bring up your example with Rachael and Rama, or request a new GO term.
  • Discussion: Should it be used for MF? In some cases, the has_output would be what was assayed in the reaction.
  • AI: Could suggest a restriction for use with ‘cytokine production’ terms only.
  • We need a better example, ideally where the paper shows stronger evidence for a catalytic activity. And where a catalytic activity can create >1 choice of output. For the current prostaglandin-I synthase activity example, the term definition is: Catalysis of the reaction: prostaglandin H(2) = prostaglandin I(2). Therefore the enzyme will always produce prostaglandin I2, and no extension is needed.
  • Can use UniProt/Protein ID in C16 with this extension. Not a PRO ID, because for any given species, you don’t need the generic identifier because you’ll know the species-specific (UniProt) one.
  • Can use PRO feature chain ID if you can, to be more specific.


DURING, HAPPENS_DURING AND EXISTS_DURING

The current tree stands at:

DURING
—EXISTS DURING (CC terms including (but not restricted to) protein complexes)
—HAPPENS DURING (MF and BP)
  • AI: Don't use the grouping term ‘during’ in annotation extensions, and just use the more specific terms. David OS will look at removing the 'during' relationship completely because of issues with its definition.


happens_during

  • Make a new rule: for phase terms, you HAVE to use ‘happens_during’ (not part_of).
  • For other GO process terms, use happens_during if you don’t know if it contributes to the process.

AI: Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.

exists_during

NB: Some of the existing ‘during’ C16 relations in Protein2GO at the moment look slightly odd. Need to relook at these.

PART_OF

Everyone agrees !!! :o

almost .....

  • The extracellular terms need a bit of work because there’s some annotations at the moment to ‘extracellular matrix’ part_of x_cell. Where logically you can’t have an extracellular space that is part of a cell.
  • Allow use of the RO relation ‘adjacent_to’ for annotation extensions for CC extracellular annotations. When this is done, MGI will need to relook at their ‘extracellular space’ part_of ‘x-cell’ annotations.


HAS_REGULATION_TARGET (UNFINISHED)

This relation ties into the transcription factor terms.

Useful to have distinction between direct and indirect. So can we just use 'has_indirect_target'? Chris prefers 'has_regulation_target' because it's more specific.

One option:

  • DNA-binding TF activity: has_regulation_target: some gene
  • DNA-binding TF activity: has_input/has_substrate: some DNA (SO ID, which is specific for the motif)

... BUT DNA-binding TF activity doesn't have is_a DNA binding as a parentage, so it's wrong to say has_input:DNA for this term. This comes back to Val's suggestions for the TF terms, to change to: DNA binding involved in negative regulation of transcription....

Some of the issues here are because it seems redundant to have a regulation GO term with 'regulation' in the annotation extension relationship.


OTHER DISCUSSION

  • Release the following relations to annotators to use
    • part_of
    • occurs_in


ACTION ITEMS

1. EDITORS: Change children of ‘response to x’ from is_a to part_of, throughout GO.

2. ANNOTATORS (Rachael and Val): check cellular component annotations that have ‘localization_dependent_on’ in C16. These are wrong. ‘localization_dependent_on’ makes sense for BP terms when A is controlling the localisation of protein B. But it doesn’t make sense for CC. Consider changing to ‘in_presence_of’.

3. ANNOTATORS: Decide if we want to see the hierarchy when we’re choosing our annotation extension in Protein2GO.

4. DAVID OS. Create new more specific has_input relations for: has_substrate, has_transport_target (transports), has_binding_target (binds).

5.  ? Change definition of ‘has_input’ to allow for its use with ‘cellular response’ terms? Currently is says ‘bound, transported, modified, consumed or destroyed’….

6. VAL AND EDITORS/DAVID HILL: look at the transcription terms. Val would like a term ‘DNA binding involved in negative regulation of transcription from RNA pol II promoter’, etc. This term would be is_a DNA binding. The advantage of this term is that you wouldn’t need to make two annotations: one for the binding, and one for the TF activity, because the term would be is_a binding. And doesn’t squeeze a process into a function term (so much). Val to submit a SourceForge item as a placeholder.

7. DAVID OS: Write out the transcription factor suggestions in OWL, to check they make sense.

8. Rachael Better define the rules for OCCURS_AT and OCCURS_IN (see above)

  • OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC)
  • OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)

9. RUTH: Update example for BP HAS_OUTPUT:x. Could use ‘fibroblast growth factor production ; GO:0090269’ with one of the FGF1-10 IDs.

10. RACHAEL: Edit the annotation extension file to make rule that has_output can (for the moment) only be used for GO-BP annotations. Enforce this rule in Protein2GO, and add to rule file for curators not using Protein2GO.

11.PASCALE AND VAL: Look to see if we can add a restriction that has_output can be used with ‘x production’ terms only, for now. We can broaden/change if necessary. E.g. cytokine production + cell adhesion molecule production.

12: RACHAEL? Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.

13. RACHAEL: Alter local_range for HAPPENS_DURING and EXISTS_DURING to remove GO-MF information.

14. RUTH Ruth to alter wiki for HAPPENS_DURING so GO-BP but not GO-MF can’t be used in C16.

15. DAVID OS: remove ‘during’ from relationships, because it can't be properly defined. It's children 'exists_during' and 'happens_during' will remain.

16. VAL and RACHAEL Look at exists_during relation uses to see if they make sense.

17. RUTH: In the part_of example on the wiki page, make it clear in a footnote or something, that Wnt-activated R activity is already part_of Wnt signaling pathway. Here we’re making a more specific statement that the Wnt-activated R activity is part_of a CANONICAL Wnt signaling pathway.

18. DAVID OS: make a new relation: ‘adjacent_to’ to describe extracellular regions that are next to a cell.

19. RACHAEL Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.