Talk:2010 GO camp binding documentation issues: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
 
(67 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[User:Siegele|Siegele]] 21:54, 29 June 2010 (UTC)
== [[User:Siegele|Debby]] 12 Jul 2010 edits after today's binding call ==


As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term.
'''[http://gocwiki.geneontology.org/index.php/Guidelines_from_Annotation_Camp#Binding_guidelines Agreed Guidelines for GOC website 19 July 2010]'''


For instance, an enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, as binding is implied, curators should avoid making redundant annotations. 


There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.
As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. The curator should try to capture the specifics as much as feasible and avoid redundant annotations.  Annotate to a binding term whenever an experiment shows binding, but not catalysis/transport.  Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the ''in vivo'' situation.


The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible: use the binding term if the experiment shows binding, but not catalysis transport; don’t use the binding term if the experiment shows catalysis or transport.
Child terms that describe a particular class of protein binding (e.g. GO:0030971:receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the ‘with’ column (8). At present a variety of identifiers can be used in the ‘with’ column (8) or the annotation extension column (16), see [http://www.geneontology.org/GO.format.gaf-2_0.shtml GO Annotation File Format 2.0 Guide].


The curator may come across Molecular Function terms where the definition doesn't adequately describe the specific substrate/target being bound, and where the request of a more specific Molecular Function would be considered inappropriateIn such cases, the annotation extension column (column 16) can be used to capture this information using the accession from ChEBI (add link) for small molecules or from UniProt (others?) for proteins. Enter the substrate/target information in column 16 in the form ChEBI:xxxxx or UniProtKB:xxxxxx. Remember, use annotation extension column (16) only if this information is not already included in the GO term and/or definition.
When a gene product is being annotated to a binding activity term, the 'with' column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated.  To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 support the '''evidence''' used to infer the function, while entries in column 16 modify the '''GO term''' used in the GO_ID column (5).  The curator also needs to remember that the 'with' column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, IMP or ISS; column 8 cannot be used with an IDA evidence code, see [http://www.geneontology.org/GO.evidence.shtml evidence code documentation].
 
'''Examples of using the 'with' column (8)'''
 
The annotation of '''Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8)''' makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the ''in vivo'' function of Protein A.
 
* 1) Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For [http://dev.biologists.org/content/130/4/693.long example], the ''C. elegans'' Unc-115 protein was shown to bind to actin filaments made with actin purified from rabbit skeletal muscle.  This would be annotated as GO:0051015:actin filament binding using an IPI evidence code and putting an accession for rabbit skeletal muscle actin, UniProtKB:P68135, in the 'with' column (8).  This annotation makes the statement that ''C. elegans'' Unc-115 has the molecular function of actin filament binding inferred from experiments using rabbit actin. 
 
* 2) Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analogThe annotation '''Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S''' captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.
 
'''Examples of using the annotation extension column (16)'''
 
The annotation of '''Protein A to a GO function term with Protein B and a has_participant relationship in the annotation extension column (16)''' makes the statement that an ''in vivo'' target of Protein A is Protein B.  This is equivalent to the post-compositional creation of a new child term.
 
* 3) The zebrafish Lnx2b protein (UnitProtKB:A4VCF7) was shown to ubiquitinate zebrafish Dharma (UniProtKB:O93236) in [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2759713/?tool=pubmed PMID:19668196]. Therefore Lnx2b can be annotated to GO:0004842:ubiquitin-protein ligase activity adding has_input UniProtKB:O93236 in annotation extension column (16). This annotation makes the statement that Dharma is a substrate of the ubiquitin-protein ligase activity of Lnx2b.
 
* 4): The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in the [http://gowiki.tamu.edu/wiki/index.php/PMID:17408620 paper], demonstrate that the target is 7β-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7β-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16): post-composing the GO term 7β-hydroxycholesterol-transporting ATPase activity.
 
The 'with' column (8) and the annotation extension column (16) should be used '''only''' for direct interactions and '''only''' when the binding relationship is not already included in the GO term and/or definition.  See [http://wiki.geneontology.org/index.php/Annotation_Cross_Products column16 documentation] for relationship types to use when adding IDs in the annotation extension column (16).
 
Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format, e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators should request new 'has_part' relationships (and terms) if these do not exist.
 
'''Quality control checks''' 
 
1. No use of the 'NOT' qualifier with 'protein binding'; GO:0005515.
This rule only applies to GO:0005515, children of this term can be qualified with NOT, as further information on the type of binding is then supplied in the GO Term e.g. NOT + 'GO:0051529 NFAT4 protein binding', would be fine, as the negative binding statement only applies to the NFAT4 protein.
 
2. Annotations to 'protein binding'; GO:0005515, should only be supplied with an evidence code where the interactor can be identified in the 'with' field.
This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.
 
3. Annotations to 'protein binding' should not use the ISS evidence code
This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.
 
== '''Links''' ==
 
* [http://wiki.geneontology.org/index.php/2010_GO_camp_binding_documentation_issues Binding wiki]
 
* [http://wiki.geneontology.org/index.php/2010_GO_camp_binding_documentation_issues#Agenda_for_discussion_02-07-2010 Agenda 2 July]
* [http://wiki.geneontology.org/index.php/2010_GO_camp_binding_documentation_issues#Agenda_for_discussion_12-07-2010 Agenda 12 July]

Latest revision as of 07:06, 19 July 2010

Debby 12 Jul 2010 edits after today's binding call

Agreed Guidelines for GOC website 19 July 2010


As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. The curator should try to capture the specifics as much as feasible and avoid redundant annotations. Annotate to a binding term whenever an experiment shows binding, but not catalysis/transport. Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation.

Child terms that describe a particular class of protein binding (e.g. GO:0030971:receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the ‘with’ column (8). At present a variety of identifiers can be used in the ‘with’ column (8) or the annotation extension column (16), see GO Annotation File Format 2.0 Guide.

When a gene product is being annotated to a binding activity term, the 'with' column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 support the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the 'with' column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, IMP or ISS; column 8 cannot be used with an IDA evidence code, see evidence code documentation.

Examples of using the 'with' column (8)

The annotation of Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8) makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.

  • 1) Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, the C. elegans Unc-115 protein was shown to bind to actin filaments made with actin purified from rabbit skeletal muscle. This would be annotated as GO:0051015:actin filament binding using an IPI evidence code and putting an accession for rabbit skeletal muscle actin, UniProtKB:P68135, in the 'with' column (8). This annotation makes the statement that C. elegans Unc-115 has the molecular function of actin filament binding inferred from experiments using rabbit actin.
  • 2) Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.

Examples of using the annotation extension column (16)

The annotation of Protein A to a GO function term with Protein B and a has_participant relationship in the annotation extension column (16) makes the statement that an in vivo target of Protein A is Protein B. This is equivalent to the post-compositional creation of a new child term.

  • 3) The zebrafish Lnx2b protein (UnitProtKB:A4VCF7) was shown to ubiquitinate zebrafish Dharma (UniProtKB:O93236) in PMID:19668196. Therefore Lnx2b can be annotated to GO:0004842:ubiquitin-protein ligase activity adding has_input UniProtKB:O93236 in annotation extension column (16). This annotation makes the statement that Dharma is a substrate of the ubiquitin-protein ligase activity of Lnx2b.
  • 4): The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in the paper, demonstrate that the target is 7β-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7β-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16): post-composing the GO term 7β-hydroxycholesterol-transporting ATPase activity.

The 'with' column (8) and the annotation extension column (16) should be used only for direct interactions and only when the binding relationship is not already included in the GO term and/or definition. See column16 documentation for relationship types to use when adding IDs in the annotation extension column (16).

Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format, e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators should request new 'has_part' relationships (and terms) if these do not exist.

Quality control checks

1. No use of the 'NOT' qualifier with 'protein binding'; GO:0005515. This rule only applies to GO:0005515, children of this term can be qualified with NOT, as further information on the type of binding is then supplied in the GO Term e.g. NOT + 'GO:0051529 NFAT4 protein binding', would be fine, as the negative binding statement only applies to the NFAT4 protein.

2. Annotations to 'protein binding'; GO:0005515, should only be supplied with an evidence code where the interactor can be identified in the 'with' field. This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.

3. Annotations to 'protein binding' should not use the ISS evidence code This rule only applies to GO:0005515, is not such a problem with child terms of protein binding where the type of protein is identified in the GO term name.

Links