Talk:2010 GO camp binding documentation issues: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 140: Line 140:
4. Annotations to 'protein binding' should not use the ISS evidence code  
4. Annotations to 'protein binding' should not use the ISS evidence code  
This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.
This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.
== [[User:Siegele|Debby]] 12 Jul 2010  edits after today's binding call ==
'''Please feel free to edit what I've written and add anything that I overlooked.'''
'''Proposed Guidelines for GOC website'''
As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. The curator should try to capture the specifics as much as feasible and avoid redundant annotations.  Annotate to a binding term whenever an experiment shows binding, but not catalysis/transport <s>; there is no need to annotate to a binding term if the experiment shows catalysis/transport</s>.  Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the ''in vivo'' situation.
<s>Annotations to protein binding terms should be maximally informative.</s> Child terms that describe a particular class of protein binding (e.g. GO:0030971:receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the ‘with’ column (8) <s>or the annotation extension column (16)</s> (see following section). At present a variety of identifiers can be used in the ‘with’ column (8) or the annotation extension column (16) including ChEBI IDs (small molecule), UniProt IDs (protein) and ....IDs [<font color="red"> make a page where each MOD can add to this list, also add a link to this list on the GO users's documentation</font>].
When a gene product is being annotated to a binding activity term, the with/from column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated.  To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 modify the '''evidence''' used to infer the function, while entries in column 16 modify the '''GO term''' used in the GO_ID column (5).  The curator also needs to remember that the with/from column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, or ISS; column 8 cannot be used with an IDA evidence code. [add link to evidence code documentation].
'''Examples of using the 'with' column (8)'''
The annotation of '''Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8)''' makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.
* 1) Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC87051/ example], Human p300 was shown to bind specifically to ''Drosophila'' histone H3. This would be annotated as GO:0042393:histone binding using an IPI evidence code and putting an accession for ''Drosophila'' histone H3, UniProtKB:P02299, in the 'with' column (8). This annotation makes the statement that human p300 has the molecular function of histone binding inferred from experiments using ''Drosophila'' histone H3.   
* 2) Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog.  The annotation '''Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S''' captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.
'''Examples of using the annotation extension column (16)'''
The annotation of '''Protein A to a GO binding term with Protein B and the relationship has_participant/has_input/has_output in the annotation extension column (16)''' makes the statement that an in vivo molecular function of Protein A is binding to Protein B.  This is equivalent to the post-compositional creation of a new child term.
* 3) If the experiments described in [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC87051/ example 1] above, had shown that p300 binds to human H3.1 protein, the annotation GO:0042393:histone binding with the accession for human H3, UniProtKB:P68431, in column 16 (along with a has_participant/has_input/has_output relationship) makes the statement that p300 has the molecular function of binding histone H3.1, essentially post-composing a GO term for binding to histone H3.1.  In this example, since an IPI evidence code requires an entry in the 'with' column, the accession UniProtKB:P68431 would also be entered in column 8.
* 4): The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in the [http://gowiki.tamu.edu/wiki/index.php/PMID:17408620 paper], demonstrate that the target is 7β-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7β-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16).
The with/from column (8) and the annotation extension column (16) should be used '''only''' for direct interactions and '''only''' when the binding relationship is not already included in the GO term and/or definition.  See [http://wiki.geneontology.org/index.php/Annotation_Cross_Products column16 documentation] for relationship types to use when adding IDs in the annotation extension column (16).
Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format, e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators can request new 'has_part' relationships (and terms) if these do not exist.
'''Quality control checks''' 
[Ruth] ideally there will be a GOC webpage of all QCs, and therefore this information will not appear in the 'binding' guidelines.
1. No use of the 'NOT' qualifier with 'protein binding'; GO:0005515.
This rule only applies to GO:0005515, children of this term can be qualified with NOT, as further information on the type of binding is then supplied in the GO Term e.g. NOT + 'GO:0051529 NFAT4 protein binding', would be fine, as the negative binding statement only applies to the NFAT4 protein.
2. Annotations to 'protein binding'; GO:0005515, should only be supplied with an evidence code where the interactor can be identified in the 'with' field.
This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.
3. Reciprocal annotations for protein binding should be made
This rule applies to GO:0005515 and it's descendants when the IPI evidence code is used
4. Annotations to 'protein binding' should not use the ISS evidence code
This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.


== '''Links''' ==
== '''Links''' ==

Revision as of 13:22, 12 July 2010

Siegele 00:30, 1 July 2010 (UTC)

As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, an enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, as binding is implied, curators should avoid making redundant annotations.

There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible: use the binding term if the experiment shows binding, but not catalysis/transport; don’t use the binding term if the experiment does show catalysis/transport.

The curator may come across Molecular Function terms where the definition doesn't adequately describe the specific substrate/target being bound, and where the request of a more specific Molecular Function would be considered inappropriate. In such cases, the identity of the substrate/target being bound can be captured in the [‘with’ column (8)] or annotation extension column (16) using identifiers from other ontologies or databases. Small molecule substrate/targets should be identified with accessions from ChEBI, and protein substrate/targets should be identified with accessions from UniProtKB (others?). Enzyme or transporter substrate/target information should be entered in [column 8 or] column 16 in the form CheBI:xxxx or UniProtKB:xxxxx. Multiple entries should be separated by pipes (add example). Keep in mind that [column 8 or] the annotation extension column (16) should be used only for direct interactions and only when the binding relationship is not already included in the GO term and/or definition.

Annotations to protein binding terms should be maximally informative. Child terms that describe a particular class of protein binding (e.g. GO:0030971 ‘receptor tyrosine kinase binding’) should be used in preference to the parent term 'protein binding'; GO:0005515. Where possible the precise identity of the interacting protein should be captured in [the ‘with’ field or] the annotation extension column (16) of an annotation. The IPI evidence code should be used for annotation of ‘’’all’’’ protein-protein interactions rather than IDA.

Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (Chris and Jane are developing 'has_part' relationships to implying substrate binding). [something missing here] existing GO to follow this new format eg Transcription factor activity has_part DNA binding. Curators can request new 'has_part' relationships (and terms) if these do not exist.


  • Not sure where the following statement belongs:

Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation, not artificial substrates.

  • Not sure whether I captured what was meant by the following statement:

The annotation extension (column 16) should only be used for direct (target of catalytic activity (using relationship ontology).

Usage of the With/From Column for IPI

We strongly recommend making an entry in the with/from column when using this evidence code to include an identifier for the other protein or other macromolecule or other chemical involved in the interaction. When multiple entries are placed in the with/from field, they are separated by pipes. Consider using IDA when no identifier can be entered in the with/from column.

Ruth edits 1 July 2010

changes highlighted in red

As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, an enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, as binding is implied, curators should avoid making redundant annotations.

There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible: use the binding term if the experiment shows binding, but not catalysis/transport; don’t use the binding term if the experiment does show catalysis/transport. (added this orphan sentence:)Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation, not artificial substrates.

The curator may come across Molecular Function terms where the definition doesn't adequately describe the specific substrate/target being bound, and where the request of a more specific Molecular Function would be considered inappropriate. In such cases, the identity of the substrate/target being bound can be captured in the ‘with’ column (8) or annotation extension column (16) using identifiers from other ontologies or databases. Small molecule substrate/targets should be identified with accessions from ChEBI, and protein substrate/targets can be identified with accessions from UniProtKB (others?). (deleted sentence) Multiple entries should be separated by pipes (add example). (deleted keep in mind) The 'with column (8) or the annotation extension column (16) should be used only for direct interactions and only when the binding relationship is not already included in the GO term and/or definition. See column16 documentation for relationship types to use when adding IDs in the annotation extension column (16).

Annotations to protein binding terms should be maximally informative. Child terms that describe a particular class of protein binding (e.g. GO:0030971 receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. Where possible the precise identity of the interacting protein should be captured in the ‘with’ column (8) or the annotation extension column (16) of an annotation. The IPI evidence code should be used in preference to IDA for annotation of all protein-protein interactions.

Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (Chris and Jane are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators can request new 'has_part' relationships (and terms) if these do not exist.

Other comments

Pascale: "The GO is committed to ‘annotating to the experiment’." I just worry that this misleads curators into doing 'text mining' rather than interpreting experiments.

One example that comes to mind (which is unrelated to protein binding) is the mouse gene Hnf1a (MGI:98504), a homeobox protein annotated to 'insulin secretion' based on the observation that a mouse lacking that gene has impaired insulin secretion following certain stimulation (PMID: 9733737). In this case, I agree that the experiment was annotated; however the authors seems to suggest a role in beta-cell glycolytic signaling rather than in insulin secretion. I am not sure what would be an experiment that directly tests signaling.

Serenella: it is worthwhile to be more precise: For instance, a catalytic subunit of an enzyme MUST bind all of the substrates and products of the reaction it catalyzes. instead of For instance, an enzyme MUST bind all of the substrates and products of the reaction it catalyzes.?

Ruth: this may be particularly relevant when annotating transporters, which is an issue Emily raised previously.

Debby: I know I haven't dealt with the question of propagation by ISS--but wasn't sure whether that was meant to be done using column 8 or column 16. Ruth wrote column 16, but referred to an example that I think is about the use of column 8. [Ruth>] the ISS issue hasn't been fully resolved. Currently ISS annotations lead to the replacement of the protein ID in column8 with ID of the protein orthologous to the annotated protein, eg if human protein A and B have an mouse orthologs then the annotation:

  • human Protein A GO:0030971:receptor tyrosine kinase binding IPI WITH human Protein B if this annotation was transfered to the mouse protein A by ISS this would create
  • mouse Protein A GO:0030971:receptor tyrosine kinase binding ISS WITH human Protein A.

So the identity of protein B is lost. Discussions so far have not resolved whether or not column 16 information would transfer and if so whether or not the in vivo information or the experimental information would transfer. We do not have time to discuss this issue before getting the first version of the guidelines to Rama and Rachael.

Jim 2010-07-07

I promised I'd write this and then forgot... here's a draft of what I was trying to say during the call:

Column 8 (with/from) and column 16 (annotation extension) are both used to identify the binding partner of a gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that column 8 modifies the evidence used to infer the function, while column 16 modifies the GO term used in column 5 (GO_ID). Thus,

Protein A GO:0005515 protein binding IPI with Protein B

only says that Protein A has the molecular function of "Interacting selectively and non-covalently with any protein or protein complex...". The function being annotated is not selective binding of protein B; binding protein B is evidence that Protein A binds some other protein or complex selectively. By contrast, using protein B in column (question: what relationship?) is equivalent to postcomposition of a new GO term for binding specifically to protein B.

This distinction allows us to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. Forexample, Drosophila chromatin was used to show that human p300 binds specifically to histone H3. This could be annotated to GO:0031493:nucleosomal histone binding using the Drosophila H3 in the with column. A more specific functional annotation could be made using H3 (not Drosophila) in column 16.

This raises the question of propagation by ISS using column 16. In Geneva, Judy suggested that column 16 should be used for classes/families of proteins rather than the forms from specific species. In the example above, column 16 would have an identifier for a generic histone H3 rather than a human or Drosophila H3.

Ruth 2010-07-08 possible modifications to Jim's statements

Column 8 (with) and column 16 (annotation extension) are both used to identify the binding partner of a gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that column 8 modifies the evidence used to infer the function, while column 16 modifies the GO term used in column 5 (GO_ID). Furthermore, interacting molecules can only be included in column 8 (with) when using the IPI evidence code.

Thus, the annotation: Protein A GO:0005515 protein binding IPI column 8 (with) Protein B makes the statement that Protein A has the molecular function of "Interacting selectively and non-covalently with any protein or protein complex...". The function being annotated is not selective binding of protein B (for example protein B may not be an in vivo substrate); binding protein B is evidence that Protein A binds some other protein or complex selectively.

In contrast, using protein B in column 16 (annotation extension) and the relationship has_participant/has_input/has_ouput is equivalent to postcomposition of a new GO term for binding specifically to protein B. i.e. the annotation Protein A GO:0016301 kinase activity IDA column 16 (annotation extension) has_input Protein B makes the statement that Protein A has the molecular function "kinase activity" and that a Protein B is a substrate of this activity.

I think we should stick with trying to agree on the wording of statements above. The reminder of the text makes sense but this guideline was proposed in April and was met with considerable disagreement.

Cross species experiments Emily suggested that when cross species experiments are annotated and the direct binding protein is added to column 8 (with column) then the orthologous gene (invivo participant) in the same species as the annotated protein should be added to column 16. eg human protein A; GO:0005515 protein binding; [with column] mouse protein B; [column 16] human ortholog protein B.

This distinction allows us to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. Forexample, Drosophila chromatin was used to show that human p300 binds specifically to histone H3. This could be annotated to GO:0031493:nucleosomal histone binding using the Drosophila H3 in the with column. A more specific functional annotation could be made using H3 (not Drosophila) in column 16.

This raises the question of propagation by ISS using column 16. In Geneva, Judy suggested that column 16 should be used for classes/families of proteins rather than the forms from specific species. In the example above, column 16 would have an identifier for a generic histone H3 rather than a human or Drosophila H3.


Debby 8 Jul 2010 draft of Binding Policy combining Jim and Ruth's edits and last week's discussion

Please feel free to edit what I've written and add anything that I overlooked.

Proposed Guidelines for GOC website

As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. The curator should try to capture the specifics as much as feasible and avoid redundant annotations. Annotate to a binding term whenever an experiment shows binding, but not catalysis/transport; there is no need to annotate to a binding term if the experiment shows catalysis/transport. Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation.

Annotations to protein binding terms should be maximally informative. Child terms that describe a particular class of protein binding (e.g. GO:0030971:receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used in preference to IDA for annotation of all protein-protein interactions and where possible the precise identity of the interacting protein should be captured in the ‘with’ column (8) or the annotation extension column (16) (see following section). At present a variety of identifiers can be used in the ‘with’ column (8) or the annotation extension column (16) including ChEBI IDs (small molecule), UniProt IDs (protein) and ....IDs [each MOD to add to this list].

When a gene product is being annotated to a binding activity term, the with/from column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 modify the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the with/from column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, or ISS [add link to evidence code documentation]. Column 8 cannot be used with an IDA evidence code.

Thus, the annotation of Protein A to GO:0005515:protein binding with evidence code IPI and Protein B in the with/from column (8) makes the statement that Protein A has the molecular function of "interacting selectively and non-covalently with any protein or protein complex…". The function being annotated is not selective binding of Protein B.

In contrast, entering Protein B and the relationship has_participant/has_input/has_output in the annotation extension column (16) is equivalent to the postcompositional creation of a new GO term for binding Protein B. That is, the annotation Protein A GO:0005515:protein binding IPI column 16 has_input Protein B makes the statement that Protein A has the molecular function of 'interacting selectively and non-covalently with Protein B.'

Examples of using the with/from column (8) to add information about the evidence used to infer the annotation of a gene product to a particular binding term. Note that the definition of the GO term remains unchanged.

  • Example 1: Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, human p300 was shown to bind specifically to Drosophila histone H3. This could be annotated to GO:0031493:nucleosomal histone binding using an accession for Drosophila H3 in the with/from (8) column.
  • Example 2: Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.

Examples of using the annotation extension column (16) to modify a GO binding term.

  • Example 1: The S. cerevisiae CHZ1 protein has been annotated to GO:0042393:histone binding with an IDA evidence code. The experiments in the paper also demonstrate that CHZ1 binds specifically to histone variant H2AZ. This information could be captured by entering the accession for yeast H2AZ in the annotation extension column (16)--essentially creating a GO term for binding to Histone H2AZ.
  • Example 2: The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in the paper, demonstrate that the target is 7β-hydroxycholesterol, the ChEBI ID for 7β-hydroxycholesterol, CHEBI:42989, can be included in the annotation extension column (16).

The with/from column (8) and the annotation extension column (16) should be used only for direct interactions and only when the binding relationship is not already included in the GO term and/or definition. See column 16 documentation for relationship types to use when adding IDs in the annotation extension column (16).

Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators can request new 'has_part' relationships (and terms) if these do not exist.

Quality control checks

[Ruth] ideally there will be a GOC webpage of all QCs, and therefore this information will not appear in the 'binding' guidelines.

1. No use of the 'NOT' qualifier with 'protein binding'; GO:0005515. This rule only applies to GO:0005515, children of this term can be qualified with NOT, as further information on the type of binding is then supplied in the GO Term e.g. NOT + 'GO:0051529 NFAT4 protein binding', would be fine, as the negative binding statement only applies to the NFAT4 protein.

2. Annotations to 'protein binding'; GO:0005515, should only be supplied with an evidence code where the interactor can be identified in the 'with' field. This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.

3. Reciprocal annotations for protein binding should be made This rule applies to GO:0005515 and it's descendants when the IPI evidence code is used

4. Annotations to 'protein binding' should not use the ISS evidence code This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.

Debby 12 Jul 2010 edits after today's binding call

Please feel free to edit what I've written and add anything that I overlooked.

Proposed Guidelines for GOC website

As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. The curator should try to capture the specifics as much as feasible and avoid redundant annotations. Annotate to a binding term whenever an experiment shows binding, but not catalysis/transport ; there is no need to annotate to a binding term if the experiment shows catalysis/transport. Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation.

Annotations to protein binding terms should be maximally informative. Child terms that describe a particular class of protein binding (e.g. GO:0030971:receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the ‘with’ column (8) or the annotation extension column (16) (see following section). At present a variety of identifiers can be used in the ‘with’ column (8) or the annotation extension column (16) including ChEBI IDs (small molecule), UniProt IDs (protein) and ....IDs [ make a page where each MOD can add to this list, also add a link to this list on the GO users's documentation].

When a gene product is being annotated to a binding activity term, the with/from column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 modify the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the with/from column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, or ISS; column 8 cannot be used with an IDA evidence code. [add link to evidence code documentation].


Examples of using the 'with' column (8)

The annotation of Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8) makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.

  • 1) Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, Human p300 was shown to bind specifically to Drosophila histone H3. This would be annotated as GO:0042393:histone binding using an IPI evidence code and putting an accession for Drosophila histone H3, UniProtKB:P02299, in the 'with' column (8). This annotation makes the statement that human p300 has the molecular function of histone binding inferred from experiments using Drosophila histone H3.
  • 2) Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.

Examples of using the annotation extension column (16)

The annotation of Protein A to a GO binding term with Protein B and the relationship has_participant/has_input/has_output in the annotation extension column (16) makes the statement that an in vivo molecular function of Protein A is binding to Protein B. This is equivalent to the post-compositional creation of a new child term.

  • 3) If the experiments described in example 1 above, had shown that p300 binds to human H3.1 protein, the annotation GO:0042393:histone binding with the accession for human H3, UniProtKB:P68431, in column 16 (along with a has_participant/has_input/has_output relationship) makes the statement that p300 has the molecular function of binding histone H3.1, essentially post-composing a GO term for binding to histone H3.1. In this example, since an IPI evidence code requires an entry in the 'with' column, the accession UniProtKB:P68431 would also be entered in column 8.
  • 4): The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in the paper, demonstrate that the target is 7β-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7β-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16).

The with/from column (8) and the annotation extension column (16) should be used only for direct interactions and only when the binding relationship is not already included in the GO term and/or definition. See column16 documentation for relationship types to use when adding IDs in the annotation extension column (16).

Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format, e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators can request new 'has_part' relationships (and terms) if these do not exist.


Quality control checks

[Ruth] ideally there will be a GOC webpage of all QCs, and therefore this information will not appear in the 'binding' guidelines.

1. No use of the 'NOT' qualifier with 'protein binding'; GO:0005515. This rule only applies to GO:0005515, children of this term can be qualified with NOT, as further information on the type of binding is then supplied in the GO Term e.g. NOT + 'GO:0051529 NFAT4 protein binding', would be fine, as the negative binding statement only applies to the NFAT4 protein.

2. Annotations to 'protein binding'; GO:0005515, should only be supplied with an evidence code where the interactor can be identified in the 'with' field. This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.

3. Reciprocal annotations for protein binding should be made This rule applies to GO:0005515 and it's descendants when the IPI evidence code is used

4. Annotations to 'protein binding' should not use the ISS evidence code This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.


Links