Annotating binding: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
 
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
PROPOSAL (2020-05-04) TO BE REVIEWED


See also: https://github.com/geneontology/go-annotation/issues/1466


==Binding annotations in the Gene Ontology==
* Molecular Function (MF) 'binding' terms are used to capture interactions, such as protein-protein, protein-nucleic acid, protein-lipid, etc.
* While GO annotations are not considered to be a repository of all protein-protein interactions, many gene products are annotated to 'protein binding' (GO:0005515) or one of its descendant terms. The binding annotations inform what is believed to be the normal biological role of a gene product, i.e. the protein-protein interactions support an author's hypothesis about how the gene product is thought to execute its molecular function in the context of a normal biological process. Protein-protein interactions for which there is not yet sufficient biological context are out of scope for GO MF annotations.


* ''TBC: Annotations using binding terms are directional where the gene product enables a binding that has an input of the bound entity.''
** For some binding terms, the input is captured in the ontology term, e.g. <code>GO:0005509 calcium ion binding</code>, <code>GO:0008134 transcription factor binding</code>. 
** If the input is not represented in the ontology, a higher level class such as <code>GO:0005515 protein binding</code> is annotated, with the interactor captured using the <code>has input</code> relation and the identifier of the interactor.
** Specific interactors may be captured with any descendant of a binding term. Note that in certain cases, such as <code>GO:0005509 calcium ion binding</code>, this is redundant and unnecessary.


[[Category: Annotation]] [[Category:Working Groups]]
  Relevant top-level GO terms
  GO:0005488 binding
  GO:0005515 protein binding
  GO:0044877 protein-containing complex binding
 
==Evidence for binding annotations==
* Binding annotations use the <code>ECO:0000353 Inferred from Physical Interaction (IPI)</code> evidence code.
** Annotation to <code>GO:0005515 protein binding</code> requires a protein ID in the <code>with</code> field, which represent the target protein with which the experiment was done. This is not required, but recommended for annotation to children of <code>GO:0005515 protein binding</code>.
** For example, an experiment may test the ability of a mouse receptor to bind a human ligand to support the inference that the mouse receptor binds the orthologous mouse ligand.  For this annotation, the identifier for the human ligand is captured in the With/From field, while the identifier for the mouse ligand is captured with the 'has input' relation.
** If a term like 'actin binding' is used for annotation, the ontology term refers to any actin.  In this case, curators must decide whether the annotation can be made more specific by capturing a specific, physiologically relevant actin with the 'has input' relation.
** Ideally, a protein ID in the <code>has input</code> extension field, which corresponds to the entry of the actual physiological target (ie human to human interaction should have two human proteins, (1) the enabler, (2) the input, and if the experiment was done with a mouse protein, that mouse protein ID should be captured in the <code>with</code>, and the orthologous human protein in the <code>has input</code> extension field.
* The exception to this practice include terms such as actin binding / tubulin binding / histone binding. The reason for those exception in that many species have many identical copies and it isn't often clear which copy of "actin" was used, and so it is difficult to specify in the with field.
 
==Annotation Examples==
 
 
INFORMATION BELOW STILL TO BE REVIEWED
 
==Choosing more descriptive terms than 'protein binding'==
* Child terms that describe a particular class of protein binding (e.g. GO:0030971 receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the 'with' column (8). At present a variety of identifiers can be used in the 'with' column (8) or the annotation extension column (16), see GO Annotation File Format 2.0 Guide.
 
==Identifying binding partners using columns 8 and 16==
* When a gene product is being annotated to a binding activity term, the 'with' column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 support the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the 'with' column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, IMP or ISS; column 8 cannot be used with an IDA evidence code, see evidence code documentation. Examples of using the 'with' column (8) The annotation of Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8) makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.
* Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, the C. elegans Unc-115 protein was shown to bind to actin filaments made with actin purified from rabbit skeletal muscle. This would be annotated as GO:0051015:actin filament binding using an IPI evidence code and putting an accession for rabbit skeletal muscle actin, UniProtKB:P68135, in the 'with' column (8). This annotation makes the statement that C. elegans Unc-115 has the molecular function of actin filament binding inferred from experiments using rabbit actin.
* Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.
* Examples of using the annotation extension column (16)
 
** The annotation of Protein A to a GO function term with Protein B and a has_participant relationship in the annotation extension column (16) makes the statement that an in vivo target of Protein A is Protein B. This is equivalent to the post-compositional creation of a new child term.
**  The zebrafish Lnx2b protein (UnitProtKB:A4VCF7) was shown to ubiquitinate zebrafish Dharma (UniProtKB:O93236) in PMID:19668196. Therefore Lnx2b can be annotated to GO:0004842:ubiquitin-protein ligase activity adding has_input UniProtKB:O93236 in annotation extension column (16). This annotation makes the statement that Dharma is a substrate of the ubiquitin-protein ligase activity of Lnx2b.
** The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in PMID:17408620, demonstrate that the target is 7-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16): post-composing the GO term 7-hydroxycholesterol-transporting ATPase activity.
 
 
==Do not annotate substrates and co-factors binding==
* Binding to substrates and cofactors are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. Annotation to these compounds is not discouraged, since this adds noise to the annotation corpus.
 
 
[[Category: Annotation Guidelines]]

Latest revision as of 05:23, 4 August 2021

PROPOSAL (2020-05-04) TO BE REVIEWED

See also: https://github.com/geneontology/go-annotation/issues/1466

Binding annotations in the Gene Ontology

  • Molecular Function (MF) 'binding' terms are used to capture interactions, such as protein-protein, protein-nucleic acid, protein-lipid, etc.
  • While GO annotations are not considered to be a repository of all protein-protein interactions, many gene products are annotated to 'protein binding' (GO:0005515) or one of its descendant terms. The binding annotations inform what is believed to be the normal biological role of a gene product, i.e. the protein-protein interactions support an author's hypothesis about how the gene product is thought to execute its molecular function in the context of a normal biological process. Protein-protein interactions for which there is not yet sufficient biological context are out of scope for GO MF annotations.
  • TBC: Annotations using binding terms are directional where the gene product enables a binding that has an input of the bound entity.
    • For some binding terms, the input is captured in the ontology term, e.g. GO:0005509 calcium ion binding, GO:0008134 transcription factor binding.
    • If the input is not represented in the ontology, a higher level class such as GO:0005515 protein binding is annotated, with the interactor captured using the has input relation and the identifier of the interactor.
    • Specific interactors may be captured with any descendant of a binding term. Note that in certain cases, such as GO:0005509 calcium ion binding, this is redundant and unnecessary.
 Relevant top-level GO terms
 GO:0005488 binding 
 GO:0005515 protein binding
 GO:0044877 protein-containing complex binding

Evidence for binding annotations

  • Binding annotations use the ECO:0000353 Inferred from Physical Interaction (IPI) evidence code.
    • Annotation to GO:0005515 protein binding requires a protein ID in the with field, which represent the target protein with which the experiment was done. This is not required, but recommended for annotation to children of GO:0005515 protein binding.
    • For example, an experiment may test the ability of a mouse receptor to bind a human ligand to support the inference that the mouse receptor binds the orthologous mouse ligand. For this annotation, the identifier for the human ligand is captured in the With/From field, while the identifier for the mouse ligand is captured with the 'has input' relation.
    • If a term like 'actin binding' is used for annotation, the ontology term refers to any actin. In this case, curators must decide whether the annotation can be made more specific by capturing a specific, physiologically relevant actin with the 'has input' relation.
    • Ideally, a protein ID in the has input extension field, which corresponds to the entry of the actual physiological target (ie human to human interaction should have two human proteins, (1) the enabler, (2) the input, and if the experiment was done with a mouse protein, that mouse protein ID should be captured in the with, and the orthologous human protein in the has input extension field.
  • The exception to this practice include terms such as actin binding / tubulin binding / histone binding. The reason for those exception in that many species have many identical copies and it isn't often clear which copy of "actin" was used, and so it is difficult to specify in the with field.

Annotation Examples

INFORMATION BELOW STILL TO BE REVIEWED

Choosing more descriptive terms than 'protein binding'

  • Child terms that describe a particular class of protein binding (e.g. GO:0030971 receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the 'with' column (8). At present a variety of identifiers can be used in the 'with' column (8) or the annotation extension column (16), see GO Annotation File Format 2.0 Guide.

Identifying binding partners using columns 8 and 16

  • When a gene product is being annotated to a binding activity term, the 'with' column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 support the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the 'with' column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, IMP or ISS; column 8 cannot be used with an IDA evidence code, see evidence code documentation. Examples of using the 'with' column (8) The annotation of Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8) makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.
  • Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, the C. elegans Unc-115 protein was shown to bind to actin filaments made with actin purified from rabbit skeletal muscle. This would be annotated as GO:0051015:actin filament binding using an IPI evidence code and putting an accession for rabbit skeletal muscle actin, UniProtKB:P68135, in the 'with' column (8). This annotation makes the statement that C. elegans Unc-115 has the molecular function of actin filament binding inferred from experiments using rabbit actin.
  • Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.
  • Examples of using the annotation extension column (16)
    • The annotation of Protein A to a GO function term with Protein B and a has_participant relationship in the annotation extension column (16) makes the statement that an in vivo target of Protein A is Protein B. This is equivalent to the post-compositional creation of a new child term.
    • The zebrafish Lnx2b protein (UnitProtKB:A4VCF7) was shown to ubiquitinate zebrafish Dharma (UniProtKB:O93236) in PMID:19668196. Therefore Lnx2b can be annotated to GO:0004842:ubiquitin-protein ligase activity adding has_input UniProtKB:O93236 in annotation extension column (16). This annotation makes the statement that Dharma is a substrate of the ubiquitin-protein ligase activity of Lnx2b.
    • The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in PMID:17408620, demonstrate that the target is 7-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16): post-composing the GO term 7-hydroxycholesterol-transporting ATPase activity.


Do not annotate substrates and co-factors binding

  • Binding to substrates and cofactors are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. Annotation to these compounds is not discouraged, since this adds noise to the annotation corpus.