Annotation Extension meeting 2014-06-16
Annotation Extension Meeting June 16th 2014 EBI-Duxford room
- Jane Lomax
- David O-S
- Ruth Lovering
- Rebecca Foulger
- Pascale Gaudet
- Aleks (am only)
- Rachael Huntley
- Valerie Wood
- Chris Mungall
- 1 HAS_INPUT
- 2 Transcription Factors
- 3 HAS_REGULATION_TARGET vs HAS_TARGET
- 4 OCCURS AT
- 5 HAS_OUTPUT
- 6 DURING, HAPPENS_DURING AND EXISTS_DURING
- 7 PART_OF
- 8 HAS_REGULATION_TARGET
- 9 OTHER DISCUSSION
- 10 ACTION ITEMS
- Use for MF and BP only.
- Discussion: We should have more specific terms under ‘has_input’ and ‘has_direct_input’:
has_input --has_direct_input ----binds ----has_substrate ----transports ----has_direct_regulation_target has_regulation_target ----has_direct_regulation_target has_participant ----has_agent ----has_output ----transports ----has_product
Case study for has_input
- Discussion: How do you annotate ADAM10, which acts as a protease on MICA to cleave in the membrane ecodomain.
- in OWL: capable_of some ((‘protease activity’) that has_substrate some mica) and (part_of some ‘ecodomain proteolysis’))
- protease activity that part_of some ‘ecodomain proteolysis’ and (has_substrate some mica).
- annotate ADAM10 to ‘membrane protein ecodomain proteolysis’ has_direct_input: MICA.
- annotate ADAM10 to ‘protease activity’ has_substrate: MICA.
- annotate ADAM10 to ‘protease activity’ (C16:part_of: ecodomain proteolysis).
- Create a new GO term: protease activity involved in ecodomain proteolysis (equivalent to: protease activity that part_of some ‘ecodomain proteolysis’). Then use has_substrate:MICA in C16.
has_input and response_to
- Question: Can you use ‘has_input’ with ‘response to x’, recording what ‘x’ is in C16?
- Discussion: Ruth’s example is ‘proteolysis [involved] in cellular response to drug. In this example, you have two has_input relationships:
- has_input: drug
- has_input: proteolysis target.
- The has_inputs work in the individual cases but when combined, how do you know which input is which?
- David OS: Best way is to see if we can express this in OWL. In this case: proteolysis that is part of some cellular response to drug
- The drug isn’t an input to the proteolysis. The proteolysis is part of the cellular response to drug.
- CONCLUSION: Therefore, You can’t put the drug in the annotation extension for the term ‘proteolysis involved in cellular response to drug’ because the proteolysis is part_of the cellular response. You would use has_input: protein in the combined term. You would need to make an additional annotation to the generic ‘cellular response to drug’ term, using has_input:drug. It’s not ideal because you’ve lost the link that the proteolysis is occurring in response to drug x.
NB: Decided that if we change the is_a ‘response to x’ to part_of relationships, then we can use the GO term part_of ‘response to x’ in the extension. (see AI for Editors).
- Background: At the moment, it’s confusing what relation to use with transcription factors. Their compound-nature is causing issues because there’s two different functions rolled into one term. E.g. for ‘protein-binding transcription factor activity’, does has_input mean the DOWNSTREAM gene that is being regulated or the protein that is being bound. Should you use ‘has_regulation_target’ for TF annotations.
- Discussion: Val treats DNA-binding and protein-binding TF differently. PomBase allows ‘has_regulation_target’ to record the gene targets for sequence-specific TFs. For protein-binding TFs, PomBase capture the gene targets in the BP terms but NOT the TF terms, with the logic that the protein-binding TFs aren’t directly binding to the promoter.
For protein-binding TFs, put the IDs of the genes on the PROCESS term.
One option is to use more specific relations: DNA binding TF involved_in regulation of transcription from Pol II
C16: has_regulation_target: some <gene> C16: has_binding_target some <SO sequence element>
protein binding TF involved_in regulation of transcription from Pol II
C16: has_regulation_target: some <gene> C16: has_binding_target some <protein IDt>
For the process terms, use ‘has_regulation_target’. David O-S wants to see these written out in OWL.
HAS_REGULATION_TARGET vs HAS_TARGET
Why do we need ‘has_regulation_target’ when we have regulation in the GO term. Why not just C16: has_target:x
- Often redundant with occurs_in, especially for GO CC
- Generally used at cell surface, sequence regions (e.g. with SO identifiers)
- Could merge occurs_at and occurs_in into one relation: occurs_at_or_within_location,
- BP or MF
- OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC). NB, because the definition of membrane includes the intrinsic and extrinsic components, you would use ‘occurs_in’ for membrane annotations
- OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)
- For now, we are going to restrict this to BP only. If you find you need to use this for MF, bring up your example or make a new GO term. Add this to the rule file.
- Discussion: Should it be used for MF? In some cases, the has_output would be what was assayed in the reaction.
- AI: Could suggest a restriction for use with ‘cytokine production’ terms only.
- We need a better example, ideally where the paper shows stronger evidence for a catalytic activity. And where a catalytic activity can create >1 choice of output. For the current prostaglandin-I synthase activity example, the term definition is: Catalysis of the reaction: prostaglandin H(2) = prostaglandin I(2). Therefore the enzyme will always produce prostaglandin I2, and no extension is needed.
- Can use UniProt/Protein ID in C16 with this extension. Not a PRO ID, because for any given species, you don’t need the generic identifier because you’ll know the species-specific (UniProt) one.
- Can use PRO feature chain ID if you can, to be more specific.
DURING, HAPPENS_DURING AND EXISTS_DURING
DURING —EXISTS DURING (CC terms including (but not restricted to) protein complexes) —HAPPENS DURING (MF and BP)
- AI: Get rid of grouping term, ‘during’.
- For phase terms, you HAVE to use ‘happens_during’ (not part_of).
- For other GO process terms, use happens_during if you don’t know if it contributes to the process.
AI: Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.
NB: Some of the existing ‘during’ C16 relations in Protein2GO at the moment look slightly odd.
Everyone agrees !!! :o
- The extracellular terms need a bit of work because there’s some annotations at the moment to ‘extracellular matrix’ part_of x_cell. Where logically you can’t have an extracellular space that is part of a cell.
- Allow use of the RO relation ‘adjacent_to’ for annotation extensions for CC extracellular annotations. When this is done, MGI will need to relook at their ‘extracellular space’ part_of ‘x-cell’ annotations.
- Release the following relations to annotators to use
1. EDITORS: Change children of ‘response to x’ from is_a to part_of, throughout GO.
2. ANNOTATORS (Rachael and Val): check cellular component annotations that have ‘localization_dependent_on’ in C16. These are wrong. ‘localization_dependent_on’ makes sense for BP terms when A is controlling the localisation of protein B. But it doesn’t make sense for CC. Consider changing to ‘in_presence_of’.
3. ANNOTATORS: Decide if we want to see the hierarchy when we’re choosing our annotation extension in Protein2GO.
4. DAVID OS. Create new more specific has_input relations for: has_substrate, has_transport_target (transports), has_binding_target (binds).
5. ?Change definition of ‘has_input’ to allow for its use with ‘cellular response’ terms? Currently is says ‘bound, transported, modified, consumed or destroyed’….
6. VAL AND EDITORS/DAVID HILL: look at the transcription terms. Val would like a term ‘DNA binding involved in negative regulation of transcription from RNA pol II promoter’, etc. This term would be is_a DNA binding. The advantage of this term is that you wouldn’t need to make two annotations: one for the binding, and one for the TF activity, because the term would be is_a binding. And doesn’t squeeze a process into a function term (so much). Val to submit a SourceForge item as a placeholder.
7. DAVID OS: Write out the transcription factor suggestions in OWL, to check they make sense.
8. ?Write the rules for OCCURS_AT and OCCURS_IN
- OCCURS_IN: All the parts of the process is contained within (CL, UBERON, GO-CC)
- OCCURS_AT: Adjacent to or in the vicinity of. (SO or GO-CC)
9. RUTH: Update example for BP HAS_OUTPUT:x. Could use ‘fibroblast growth factor production ; GO:0090269’ with one of the FGF1-10 IDs.
10. RACHAEL: Edit the annotation extension file to make rule that has_output can (for the moment) only be used for GO-BP annotations. Enforce this rule in Protein2GO, and add to rule file for curators not using Protein2GO.
11.PASCALE AND VAL: Look to see if we can add a restriction that has_output can be used with ‘x production’ terms only, for now. We can broaden/change if necessary. E.g. cytokine production + cell adhesion molecule production.
12: RACHAEL? Add a restriction that you can’t use part_of relation between a GO process and a ‘cell cycle phase ; GO:0022403’ in C16’. This requires a happens_during’ relationship.
13. RACHAEL: Alter local_range for HAPPENS_DURING and EXISTS_DURING to remove GO-MF information.
14. RUTH Ruth to alter wiki for HAPPENS_DURING so GO-BP but not GO-MF can’t be used in C16.
15. DAVID OS: remove ‘during’ from relationships.
16. RUTH: In the part_of example on the wiki page, make it clear in a footnote or something, that Wnt-activated R activity is already part_of Wnt signaling pathway. Here we’re making a more specific statement that the Wnt-activated R activity is part_of a CANONICAL Wnt signaling pathway.
17. DAVID OS: make a new relation: ‘adjacent_to’ to describe extracellular regions that are next to a cell.