Guidance for updating deprecated Annotation Extension Relations

From GO Wiki
Jump to navigation Jump to search

General Guidance

This documentation constitutes general recommendations only for updating annotations that have used a deprecated relation in the annotation extension. Any remaining deprecated relations will potentially affect the release of the whole annotation row, therefore everyone should be working towards changing the relationship used, or deleting the extension and (possibly) replacing the deleted annotation extension with a new annotation.

When updating annotations, it should be kept in mind that the statement in the annotation extension is extending the primary GO term annotated (in column 4 of the GAF) and also that any information in the extension should be physiologically relevant.

 The combination of GO term plus Anotation Extension needs to make sense as a class on its own - ignoring the gene product. 
 The result should be a subclass (i.e. effectively a child term) of the primary GO term used. 

Some of the actions to be taken are clear-cut and the following rules will apply;

 IF primary GO ID = GO:CC
 AND extension relation is one of dependent_on, in_presence_of, in_absence_of
 AND extension entity is ChEBI ID
 then delete extension
 IF primary GO ID = GO:CC
 AND extension relation is dependent_on
 AND extension entity is GO:BP
 then change extension relation to exists_during


The remainder of corrections will most likely require the curator to go back to the paper to determine how the entity in the extension is related to the primary GO term annotated (this demonstrates that the meaning of the existing annotation is not clear and should be modified).

If a curator comes across an example that is not covered by this guidance, and it is not clear how the annotation should be updated, the example should be brought to an annotation call to be discussed and resolved.

Additional recommendations have been made for specific annotations on previous annotation calls;

http://wiki.geneontology.org/index.php/Annotation_Conf._Call,_June_23,_2015

http://wiki.geneontology.org/index.php/Annotation_Conf._Call,_July_28,_2015

See the Excel spreadsheet containing details of annotations using deprecated relations, with assigned_by information

Dependent_on

ChEBI identifiers

When dependent_on is used with a ChEBI ID, consider the following;

  • If the chemical in the annotation extension is already mentioned in the GO term definition, there is no reason to add the chemical in the extension - delete the extension
  • If the definition does not mention the chemical, should the definition be updated to include the chemical, i.e. is it always required?
  • If the primary GO term used is a Cellular Component (i.e. the curator was trying to express that a gene product is at a particular location when a specific chemical is present), this is out-of-scope for GO, delete the extension.
  • If the chemical is physiologically relevant (i.e. a chemical used to induce a physiological response, such as the use of hydrogen peroxide to induce oxidative stress) then consider creating a new direct annotation to the relevant Biological Process GO term (i.e. response to oxidative stress) and also consider including 'part_of' the relevant Biological Process GO term as the annotation extension (i.e. part_of response to oxidative stress)
  • If the chemical only represents an assay condition, delete the extension
  • If none of the above applies, consider using a different relation e.g. “activated_by”

Example 1

 Gene product: UniProtKB:P25632
 GO ID:		GO:0015616 DNA translocase activity
 Extension:	dependent_on(CHEBI:15422 ATP)

Definition of GO:0015616 DNA translocase activity; Catalysis of the reaction: ATP + H2O = ADP + phosphate, to drive movement along a single- or double-stranded DNA molecule

Action: ATP is in the definition of the GO term, therefore there is no need to put ATP in the extension - delete the extension.

Example 2

 Gene product: UniProtKB:A0A060VXQ0
 GO ID:	GO:0090632 N-glycolylneuraminic acid (Neu5Gc) cytidylyltransferase activity 
 Extension:	dependent_on (CHEBI:18420 magnesium(2+))

Definition of GO:0090632 N-glycolylneuraminic acid (Neu5Gc) cytidylyltransferase activity; Catalysis of the reaction: CTP + Neu5Gc = diphosphate + CMP-Neu5Gc.

Action: In this case the Mg2+ is not in the definition of the GO term. In this case it appears that Mg2+ is essential for this activity, therefore consider re-writing the GO term definition, or alternatively using the “activated_by” extension relation instead.

GO identifiers

When dependent_on is used with a GO ID, consider the following;

  • If the GO ID in the extension is a Biological Process or Molecular Function, consider changing the relation to part_of, if you consider the primary GO term used is part of the GO term used in the extension (see example below)
  • If the GO ID in the extension is a Biological Process or Molecular Function, delete the extension and consider reversing the annotation, i.e. make the primary annotation to the GO term in the extension and use the other GO term in the extension with either “part_of” or “causally_upstream_of” (see examples below)
  • If the primary GO term used is a Cellular Component and the GO ID in the extension is a Biological Process, consider changing the relation to exists_during. (see example below)

Example 1

 Gene product: UniProtKB: O43474
 GO ID:	GO:0045944 positive regulation of transcription from RNA polymerase II promoter
 Extension:	dependent_on(GO:0000165 MAPK cascade)

Action: Consider changing relation to part_of

Example 2

 Gene product: UniProtKB: Q60855
 GO ID:	GO:0097527 necroptotic signaling pathway 
 Extension:	dependent_on(GO:0004672 protein kinase activity)

Action: This is a 'reverse annotation'. Delete the extension and consider making the primary annotation to the 'deleted' GO ID and including an extension with the relation part_of, as follows:

 Gene product: UniProtKB: Q60855
 GO ID:	GO:0004672 protein kinase activity
 Extension:	part_of(GO:0097527 necroptotic signaling pathway)

Example 3

 Gene product: UniProtKB: Q15485
 GO ID:	GO:0001867 complement activation, lectin pathway
 Extension:	dependent_on(GO: 0003823 antigen binding)

Action: Consider reversing the GO IDs and changing the relation to causally_upstream_of;

 Gene product: UniProtKB: Q15485
 GO ID:	GO:0003823 antigen binding
 Extension:	causally_upstream_of(GO:0001867 complement activation, lectin pathway)

Note: the above annotation will be folded by the GOC script to automatically create:

 Gene product: UniProtKB: Q15485
 GO ID:	GO:0001867 complement activation, lectin pathway

Example 4

 Gene product: UniProtKB: P07737
 GO ID:	GO:0005938 cell cortex
 Extension:	dependent_on(GO:0006939 smooth muscle contraction)

Action: Consider changing the relation to exists_during.

Protein identifiers

When dependent_on is used with a protein ID, consider the following;

  • If the protein stated in the extension is required for the Biological Process or Molecular Function described by the primary GO term this is a 'reverse' annotation, the protein in the extension should instead be annotated directly to that BP or MF:
  1. Delete the protein from the extension
  2. Add the qualifier 'contributes_to to a MF annotation
  3. Annotate the 'deleted' protein directly to the BP or MF GO term (to create the primary rather than reverse annotation), remember to add the qualifier 'contributes_to to a MF annotation
  4. Consider annotating the Complex Portal ID for the complex directly with the BP or MF.
  • If the primary GO term that is annotated is a Cellular Component:
  1. Delete the protein from the extension
  2. Consider annotating the 'deleted' protein directly to a “protein localization to [component]” BP GO term (to create the primary rather than reverse annotation).

Don’t attempt to put all proteins involved in a pathway into the extension.

Example 1

 Gene product: UniProtKB: Q9C0C7
 GO ID:	GO:0043552 positive regulation of phosphatidylinositol 3-kinase activity
 Extension:	dependent_on(UniProtKB:O60260 Parkin)

Action: This is a 'reverse' annotation, delete the extension and consider whether Parkin should be annotated to GO:0043552 directly.

Example 2

 Gene product: UniProtKB: Q9ESD1(Prostasin)
 GO ID:	GO:0005615 extracellular space
 Extension:	dependent_on(UniProtKB:O70362 Phospholipase D)

Action: Here it appears that phospholipase D is responsible for localization of Q9ESD1(Prostasin), delete the extension, and add a primary annotation to the deleted protein:

 Gene product: UniProtKB: O70362 (Phospholipase D)
 GO ID:	GO:0071692 protein localization to extracellular region
 Extension:	transports_or_maintains_localization_of(UniProtKB:Q9ESD1 Prostasin)

Pfam identifiers

When dependent_on is used with a Pfam ID, consider the following;

  • Is the Pfam domain just indicating the region of the protein being bound? If so, this could be captured using a more specific protein binding term

Example 1

Gene product: UniProtKB: O60132
GO ID:	GO:0005515 protein binding
With/from:	PomBase: SPBC776.02c (Dis2)
Extension:	dependent_on(Pfam:PF00018 SH3 domain)

Action: In this case the paper (PMID:24554432) describes that Tea4 requires it’s SH3 domain in order to bind to Dis2. Delete the extension, this information should not be captured in the annotation for Tea4 but instead the annotation for Dis2 should use the GO term GO:0017124 SH3 domain binding with the Tea4 identifier in the with/from field, as follows:

 Gene product: PomBase: SPBC776.02c (Dis2)
 GO ID:	GO:0017124 SH3 domain binding
 With/from:	UniProtKB: O60132 (Tea4)

Sequence Ontology identifiers

When dependent_on is used with a Sequence Ontology ID, consider the following;

  • If the SO ID is referring to a region that is being bound by the gene product being annotated, consider using the occurs_at relation instead.

Example 1

 Gene product: UniProtKB: Q9UIF9
 GO ID:	GO:0042393 histone binding
 With/from:	UniProtKB: P62805 (Histone H4)
 Extension:	dependent_on(SO:0001729 H4K16 acylation site)

Action: As the extension describes the region of Histone H4 that is being bound, change the relation to occurs_at:

 Gene product: UniProtKB: Q9UIF9
 GO ID:	GO:0042393 histone binding
 With/from:	UniProtKB: P62805 (Histone H4)
 Extension:	occurs_at(SO:0001729 H4K16 acylation site)

In_presence_of

ChEBI identifiers

When in_presence_of is used with a ChEBI ID, consider the following;

  • If the chemical in the annotation extension is already mentioned in the GO term definition, there is no reason to add the chemical in the extension - delete the extension
  • If the definition does not mention the chemical, should the definition be updated to include the chemical, i.e. is it always required?
  • If the primary GO term used is a Cellular Component (i.e. the curator was trying to express that a gene product is at a particular location when a specific chemical is present), this is out-of-scope for GO, delete the extension.
  • Is the chemical physiologically relevant? (i.e. a chemical used to induce a physiological response, such as the use of hydrogen peroxide to induce oxidative stress) then consider creating a new direct annotation to the relevant Biological Process GO term (i.e. response to oxidative stress) and also consider including 'part_of' the relevant Biological Process GO term as the annotation extension (i.e. part_of response to oxidative stress).
  • If the chemical only represents an assay condition, delete the extension.
  • If none of the above applies, consider using a different relation e.g. “activated_by”.

Example 1

 Gene product: UniProtKB: Q12510 (Cmr1)
 GO ID:	GO:0034399 nuclear periphery
 Extension:	in_presence_of(CHEBI:25255 methyl methanesulfonate)

Action: In this case the paper (PMID:25817432) states that Cmr1 localizes to the nuclear periphery in response to methyl methanesulfonate, a DNA damaging agent. This type of information is out of scope for GO, so delete the extension. Consider creating a new annotation using a 'response to....' GO term. For example, in this case there is the term GO:0072702 response to methyl methanesulfonate can be used to create a direct annotation to Cmr1 instead:

 Gene product: UniProtKB: Q12510 (Cmr1)
 GO ID:	GO:0072702 response to methyl methanesulfonate

GO identifiers

When in_presence_of is used with a GO ID (GO:Complex only allowed), consider the following;

  • Consider annotating the Complex Portal ID with the GO term instead
  • Is the GO:complex present only included as an assay condition? If so, this should not be captured, delete the extension.

Example 1

 Gene product: UniProtKB: P49757
 GO ID:	GO:0030335 positive regulation of cell migration
 Extension:	in_presence_of(GO:0005587 collagen type IV trimer)

Action: From this paper (PMID:19581412) it appears as if the type IV collagen is included only as a substrate for the cell migration. Cell migration requires the presence of a suitable surface for the cell to migrate on/through. The collagen is added to the experimental assay as an 'assay condition', therefore delete the extension. In addition, this is like a 'reverse' annotation, and full reading of the paper may support the annotation of specific collagen IDs.

Example 2

 Gene product: UniProtKB: Q96J84 (Neph1)
 GO ID:	GO:0005886 plasma membrane
 Extension:	in_presence_of(GO:0031941 filamentous actin)

Action: In this paper (PMID:21402783), an intact actin cytoskeleton is required for localization of Neph1 to the plasma membrane. Unfortunately this is out-of-scope for GO annotation, therefore delete the extension.

Protein identifiers

When in_presence_of is used with a protein ID, consider the following;

  • If the protein stated in the extension is required for the Biological Process or Molecular Function described by the primary GO term this is a 'reverse' annotation, the protein in the extension should instead be annotated directly to that BP or MF and the following edits should be considered:
  1. Delete the protein from the extension
  2. Add the qualifier 'contributes_to to a MF annotation
  3. Annotate the 'deleted' protein directly to the BP or MF GO term (to create the primary rather than reverse annotation), remember to add the qualifier 'contributes_to to a MF annotation, and perhaps a 'regulation of BP' annotation
  4. Annotating the Complex Portal ID for the complex directly with the BP or MF.
  • If the primary GO term that is annotated is a Cellular Component:
  1. Delete the protein from the extension
  2. Consider annotating the 'deleted' protein directly to a “protein localization to [component]” BP GO term (to create the primary rather than reverse annotation).

Don’t attempt to put all proteins involved in a pathway into the extension.

Example 1

 Gene product: UniProtKB: P40096
 GO ID:	GO:0003713 transcription coactivator activity
 Extension:	in_presence_of(UniProtKB:Q92317)

Action: The example above is a 'reverse' annotation. If these proteins form a complex to perform this activity then each of them should be annotated to the GO term. As this is a Molecular Function it is also necessary to include contributes_to as the qualifier. Consider also annotating the Complex Portal ID for the complex with this GO term.

Bur6 and Ydr1 are subunits of the NC2 complex which regulates transcription through binding SPT15/TBP. In Fig 4D of PMID: 12237409 In vitro transcription assays are conducted with wild-type or bur6–1 whole-cell extract with increasing amounts of purified either recombinant Bur6, Ydr1, or Bur6 + Ydr1 (NC2) added. It is odd that adding recombinant Bur6p had no effect on transcription of the Bur6-1 extract, possibly the Ydr1 Bur6-1 extract is not able to bind the recombinant Bur6p when added after cell lysis. However, the addition of both recombinant proteins (NC2) did restore transcription.

Therefore, the WITH field should be used to capture the information that these 2 proteins are subunits of a protein complex, as well as the contributes_to qualifier. As neither protein on their own is capable of activating transcription. This requires the use of the IGI evidence code and the inclusion of Ydr1 (NCB2) in the WITH field:

 Gene product: UniProtKB: P40096
 contributes_to GO ID:	GO:0003713 transcription coactivator activity
 WITH/from:	UniProtKB: Q92317
 Gene product: UniProtKB: Q92317
 contributes_to GO ID:	GO:0003713 transcription coactivator activity
 WITH/from:	UniProtKB: P40096

In_absence_of

Protein identifiers

When in_absence_of is used with a protein ID, consider the following;

  • Processes and activities occur when many proteins are absent, is the statement in the extension physiologically relevant?
  • If so, is this a 'reverse' annotation? Can the protein ID in the extension instead be annotated directly with a negative regulation GO term (or positive regulation if the primary GO term is a negative regulation)?
  • Can a new GO term be requested that covers the concept, e.g. some existing terms include "chaperone mediated protein folding independent of cofactor" and "non-canonical Wnt signaling pathway", the latter of which describes a beta-catenin-independent pathway.

Example 1

 Gene product: UniProtKB: O15455 (TLR3)
 GO ID:	GO:0097527 necroptotic signaling pathway
 Extension:	in_absence_of(UniProtKB:Q13489 (cIAP1)

Action: In this paper (PMID:21737330), cIAP1 blocks the TLR3 effect on necroptosis, therefore the inclusion of cIAP1 here is a 'reverse' annotation, instead cIAP1 should be annotated directly to negative regulation of necroptotic signaling pathway. Delete the extension as it is appropriate to annotate TLR3 to necroptotic signaling pathway without any extension.

 Gene product: UniProtKB: O15455 (TLR3)
 GO ID:	GO:0097527 necroptotic signaling pathway
 Gene product: UniProtKB: Q13489 (cIAP1)
 GO ID:	negative regulation of necroptotic signaling pathway

ChEBI identifiers

When in_absence_of is used with a ChEBI ID, consider the following;

  • If the chemical only represents an assay condition, delete the extension
  • If the chemical inhibits the MF or BP consider using the “inhibited_by” relation instead
  • If the primary GO term used is a Cellular Component (i.e. the curator was trying to express that a gene product is at a particular location when a specific chemical is absent), this is out-of-scope for GO, delete the extension.

Example 1

 Gene product: UniProtKB: P10080 (SBP1)
 GO ID:	GO:0010494 cytoplasmic stress granule
 Extension:	in_absence_of(CHEBI:17234 glucose)

Action: In this paper (PMID:23222640), the authors performed the experiments under glucose deprivation because they wanted the cells to be in a defined stressed state. Therefore this represents an assay condition, delete the extension.

Example 2

 Gene product: UniProtKB: P10503 (PutA)
 GO ID:	GO:0043565 sequence-specific DNA binding
 Extension:	in_absence_of(CHEBI:26271 proline)

Action: In this paper (PMID:8483946), proline prevents binding of PutA to the put operator sites, therefore the relation "inhibited_by" should be used instead.