Annotating binding: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 13: Line 13:
** For example, an experiment may test the ability of a mouse receptor to bind a human ligand to support the inference that the mouse receptor binds the orthologous mouse ligand.  For this annotation, the identifier for the human ligand is captured in the With/From field, while the identifier for the mouse ligand is captured with the 'has input' relation.
** For example, an experiment may test the ability of a mouse receptor to bind a human ligand to support the inference that the mouse receptor binds the orthologous mouse ligand.  For this annotation, the identifier for the human ligand is captured in the With/From field, while the identifier for the mouse ligand is captured with the 'has input' relation.
** If a term like 'actin binding' is used for annotation, the ontology term refers to any actin.  In this case, curators must decide whether the annotation can be made more specific by capturing a specific, physiologically relevant actin with the 'has input' relation.
** If a term like 'actin binding' is used for annotation, the ontology term refers to any actin.  In this case, curators must decide whether the annotation can be made more specific by capturing a specific, physiologically relevant actin with the 'has input' relation.
==Annotation Examples==





Revision as of 16:39, 4 May 2020

PROPOSAL (2020-05-04) TO BE REVIEWED

Binding annotations in the Gene Ontology

  • Binding terms in the Molecular Function (MF) ontology can be used to capture macromolecular interactions, such as protein-protein, protein-nucleic acid, protein-lipid, etc.
  • Annotations using binding terms are directional where the gene product enables a binding that has an input of the bound entity.
    • For some binding terms, the desired annotation specificity of the input is already captured in the ontology term, e.g. 'calcium ion binding'.
    • For other binding terms, the desired annotation specificity of the input is not represented in the ontology, e.g. protein binding. In these cases, the physiologically relevant input needs to be captured using the 'has input' relation.

Evidence for binding annotations

  • Binding annotations use the Inferred from Physical Interaction (IPI, ECO:0000353) evidence code.
  • If the binding annotation is supported by an experiment that uses the physiologically relevant interactor, then there is no need to populate the With/From field. The interactor is captured in the ontology term or with the 'has input' relation (see above).
  • If the binding annotation is supported by an experiment uses a 'surrogate' interactor, then the surrogate is captured in the With/From field.
    • For example, an experiment may test the ability of a mouse receptor to bind a human ligand to support the inference that the mouse receptor binds the orthologous mouse ligand. For this annotation, the identifier for the human ligand is captured in the With/From field, while the identifier for the mouse ligand is captured with the 'has input' relation.
    • If a term like 'actin binding' is used for annotation, the ontology term refers to any actin. In this case, curators must decide whether the annotation can be made more specific by capturing a specific, physiologically relevant actin with the 'has input' relation.

Annotation Examples

Protein binding annotations in the Gene Ontology

  • While GO annotations are not considered to be a repository of all protein-protein interactions, many gene products are annotated to 'protein binding' (GO:0005515) or one of its child terms. In making these annotations, contributing groups may follow slightly different practices with respect to the types of experimental evidence used to support these inferences, e.g. some groups may use co-immunoprecipitation as supporting evidence for a protein binding annotation between two gene products, others not. However, all groups generally adhere to the principle that, when annotated, protein binding interactions inform what is believed to be the normal biological role of a gene product, i.e. the protein-protein interactions support an author's hypothesis about how the gene product is thought to execute its molecular function in the context of a normal biological process. Protein-protein interactions for which there is not yet sufficient biological context are discouraged as sources of GO MF annotations.

Choosing more descriptive terms than 'protein binding'

  • Child terms that describe a particular class of protein binding (e.g. GO:0030971 receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the 'with' column (8). At present a variety of identifiers can be used in the 'with' column (8) or the annotation extension column (16), see GO Annotation File Format 2.0 Guide.

Identifying binding partners using columns 8 and 16

  • When a gene product is being annotated to a binding activity term, the 'with' column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 support the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the 'with' column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, IMP or ISS; column 8 cannot be used with an IDA evidence code, see evidence code documentation. Examples of using the 'with' column (8) The annotation of Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8) makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.
  • Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, the C. elegans Unc-115 protein was shown to bind to actin filaments made with actin purified from rabbit skeletal muscle. This would be annotated as GO:0051015:actin filament binding using an IPI evidence code and putting an accession for rabbit skeletal muscle actin, UniProtKB:P68135, in the 'with' column (8). This annotation makes the statement that C. elegans Unc-115 has the molecular function of actin filament binding inferred from experiments using rabbit actin.
  • Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.
  • Examples of using the annotation extension column (16)
    • The annotation of Protein A to a GO function term with Protein B and a has_participant relationship in the annotation extension column (16) makes the statement that an in vivo target of Protein A is Protein B. This is equivalent to the post-compositional creation of a new child term.
    • The zebrafish Lnx2b protein (UnitProtKB:A4VCF7) was shown to ubiquitinate zebrafish Dharma (UniProtKB:O93236) in PMID:19668196. Therefore Lnx2b can be annotated to GO:0004842:ubiquitin-protein ligase activity adding has_input UniProtKB:O93236 in annotation extension column (16). This annotation makes the statement that Dharma is a substrate of the ubiquitin-protein ligase activity of Lnx2b.
    • The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in PMID:17408620, demonstrate that the target is 7-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16): post-composing the GO term 7-hydroxycholesterol-transporting ATPase activity.

Ontology development for protein binding

Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format, e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators should request new 'has_part' relationships (and terms) if these do not exist.

Using terms that imply binding of substrates

  • As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. The curator should try to capture the specifics as much as feasible and avoid redundant annotations. Annotate to a binding term whenever an experiment shows binding, but not catalysis/transport. Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation.


Protein binding annotations in the Gene Ontology

  • The Molecular Function (MF) ontology can be used to capture macromolecular interactions, such as protein-protein, protein-nucleic acid, protein-lipid interactions, etc. While GO annotations are not considered to be a repository of all protein-protein interactions, many gene products are annotated to 'protein binding' (GO:0005515) or one of its child terms. In making these annotations, contributing groups may follow slightly different practices with respect to the types of experimental evidence used to support these inferences, e.g. some groups may use co-immunoprecipitation as supporting evidence for a protein binding annotation between two gene products, others not. However, all groups generally adhere to the principle that, when annotated, protein binding interactions inform what is believed to be the normal biological role of a gene product, i.e. the protein-protein interactions support an author's hypothesis about how the gene product is thought to execute its molecular function in the context of a normal biological process. Protein-protein interactions for which there is not yet sufficient biological context are discouraged as sources of GO MF annotations.

Choosing more descriptive terms than 'protein binding'

  • Child terms that describe a particular class of protein binding (e.g. GO:0030971:receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the 'with' column (8). At present a variety of identifiers can be used in the 'with' column (8) or the annotation extension column (16), see GO Annotation File Format 2.0 Guide.

Identifying binding partners using columns 8 and 16

  • When a gene product is being annotated to a binding activity term, the 'with' column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 support the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the 'with' column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, IMP or ISS; column 8 cannot be used with an IDA evidence code, see evidence code documentation. Examples of using the 'with' column (8) The annotation of Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8) makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.
  • Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, the C. elegans Unc-115 protein was shown to bind to actin filaments made with actin purified from rabbit skeletal muscle. This would be annotated as GO:0051015:actin filament binding using an IPI evidence code and putting an accession for rabbit skeletal muscle actin, UniProtKB:P68135, in the 'with' column (8). This annotation makes the statement that C. elegans Unc-115 has the molecular function of actin filament binding inferred from experiments using rabbit actin.
  • Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.
  • Examples of using the annotation extension column (16)
    • The annotation of Protein A to a GO function term with Protein B and a has_participant relationship in the annotation extension column (16) makes the statement that an in vivo target of Protein A is Protein B. This is equivalent to the post-compositional creation of a new child term.
    • The zebrafish Lnx2b protein (UnitProtKB:A4VCF7) was shown to ubiquitinate zebrafish Dharma (UniProtKB:O93236) in PMID:19668196. Therefore Lnx2b can be annotated to GO:0004842:ubiquitin-protein ligase activity adding has_input UniProtKB:O93236 in annotation extension column (16). This annotation makes the statement that Dharma is a substrate of the ubiquitin-protein ligase activity of Lnx2b.
    • The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in PMID:17408620, demonstrate that the target is 7-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16): post-composing the GO term 7-hydroxycholesterol-transporting ATPase activity.

Ontology development for protein binding

Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format, e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators should request new 'has_part' relationships (and terms) if these do not exist.