Binding terms working group: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
 
(111 intermediate revisions by 9 users not shown)
Line 1: Line 1:
[[Category:Ontology]]
== Working group ==
== Working group ==


* Ben Hitz
* Ben Hitz
* David Hill
* Harold Drabkin
* Debbie Siegele
* Debby Siegele
* Emily Dimmer
* Emily Dimmer
* Jim Hu
* Jim Hu
* Midori Harris
* Mike Cherry
* Mike Cherry
* Peter D'Eustachio
* Peter D'Eustachio
* Ruth Lovering
* Ruth Lovering


== Summary ==
== '''Objectives of the binding terms working group are to provide draft guidelines with examples on the following:''' ==
 
 
1. What binding activities should be included in GO


From the GO function guidelines (listed below) catalytic and transporter activities should not be related to binding terms.
2. The application of binding term usage in conjunction with column 16


3. The transfer of 'binding' term annotations via ISS/ISO


'''The proposal is to remove non-protein, non-RNA/DNA substrate binding terms from GO whenever possible and ensure statements within the GO term directing GO curators to annotate to binding terms are removed where appropriate.'''


Examples:
== '''Proposed Guidelines:''' ==


1. Obsolete GO:0043287: poly(3-hydroxyalkanoate) binding
=== July 28, 2009 ===
Binding terms guidelines aim to minimize redundancy and duplication of information of GO term usage.


2. The term: GO:0016887 ATPase activity should NOT include the comment: Consider also annotating to the molecular function term 'ATP binding, GO:0005524'.
Enzymes MUST bind ALL of the substrates (and products) involved in a catalyzed reaction - there is no action at a distance. Therefore, during the annotation of an enzyme, it is not necessary to associated a list of GO binding terms describing all of the substrates and products, if this binding is implied by the GO term describing the catalytic function of the enzyme.


Note enzymes MUST bind ALL of the substrates involved in a catalyzed reaction - there is no action at a distance.
However, GO terms are not protein specific, therefore use of a binding term with a specific substrate/product may provide additional information not provided by the catalytic function alone. For example Rehemtulla  et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. This information can be annotated as: GO:0004252 serine-type endopeptidase activity, GO:0051605 protein maturation by peptide bond cleavage but the addition of GO:0070678 preprotein binding along with the protein ID for von Willebrand factor/VWF/P04275 in column 16 would enable this additional information to be captured.


'''Aim to circulate proposal to all GO annotators by the end of Friday 15th May?'''
Curator should use their judgment to decide how specific to make 
the description for the bound substrate/product. Curators should recognize that GO annotation should capture information relevant to the in vivo situation, not artificial substrates.  For example, PMID: 17916063 describes the cleavage of synthetic peptides by SENP1/Q9P0U3.  The peptide sequences were derived from several different SUMO sequences and therefore the following GO terms could be associated with SENP1: GO:0032183 SUMO binding (with protein IDs for SUMO1, P63165 and SUMO2, P61956, included in column 16), GO:0070139 SUMO-specific endopeptidase activity. 


== Pros ==
The GO is committed to ‘annotating to the experiment’.  Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding, don’t use the binding term if the experiment shows catalysis but not the specific binding activity. 


What these "binding" proposals all have in common is that they essentially want to track, for all enzymes the strict biochemical mechanism and all cofactors for each reaction, as well as all "relevant" substrate-product combinations. That is better left to some other database.
Annotation of binding reactions is confounded by the complexity of assays and kinetics of ‘binding’ studies, therefore a curator should use their judgment to decide whether the interaction is physiologically relevant.  


GO does not track biochemical reactions.  It doesn’t track the reactants nor the products.  
Proteins involved in transport should be annotated following the same guidelines described above for enzymes.  


Should GO track protein kinase substrates?  Glycolization sites? Ubiqutinylation substrates?
=== August 4, 2009 ===
GO needs to be consistent, why should GO partially track some reactants some of the time.  That's not going to help anyone in the long run. In 99% of all cases, it will be better to cross index a database that is actually DESIGNED to store this sort of data.
'''Avoid Redundant Binding Relationships For Substrates/Products'''


This proposal suggests that we should remove from GO terms such as GO:0043287: poly(3-hydroxyalkanoate) binding and replace it with nothing - because this description of substrate binding is not the role of GO, and delete the majority of ATP binding annotations, ATP binding to only be associated with proteins which bind ATP as a co-factor.
The purpose of the binding term guidelines is to minimize redundancy and duplication of GO term information.


== Cons ==
An enzyme MUST bind all of the substrates and products of the reaction it catalyzes.  Similarly, a transporter MUST bind the molecules it transports.  Therefore, binding is implied by the molecular function GO term describing the activity of an enzyme or transporter.  Consequently, it is redundant to annotate an enzyme or transporter with GO binding terms for each of its substrate/products, and curators should avoid making such redundant annotations.


Currently there are substantial numbers of binding terms associated with protein records by electronic means, for instance: 1,539,419 electronic annotations to just the ATP binding terms (versus 880 manual annotations), which include both 'substrate binding' and 'cofactor binding'Furthermore, many of these terms are associated in a 'systematic manner', through for example protein domains, eg InterPro includes a number of domains which define a nucleotide binding site, for instance; IPR011761 ATP-grasp fold.
There will be some cases, however, where it is appropriate to annotate a binding relationshipFor example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity.  In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.


Do we need to remove substrate ATP binding annotations? It is useful for proteins to be grouped by the energy source they use to carry out a catalysis. For example the main purpose of an ATP-dependent enzyme, for instance a peptidase, is not to break down ATP, but to break peptide bonds.  
The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding directly, don’t use the binding term if the experiment shows catalysis, but not the specific binding activity.  In cases where curators feel it is important to annotate to a binding term where catalytic activity has been shown, but no binding assays were performed, an IC (inferred by curator) evidence code should be used. 
 
Curators should use their judgment about when to associate an enzyme or transporter with a binding term for its substrates/products.


It is going to involve a vast amount of work for the annotation groups to split up the nucleotide binding annotations into 'substrate binding' or 'cofactor binding' types. 
=== September 2, 2009 ===


In addition a large amount of information will be deleted from GO.
'''Avoid Redundant Binding Relationships For Substrates/Products'''


In order to preserve some of this information, but in a more appropriate ‘GO’ format could the GO terms provide an indication in the definition (or term's parentage) the specific ribonucleotide being used, e.g. making the term more specific 'GTP-dependent helicase activity', 'protein kinase activity' or expanding the ontology:
The purpose of the binding term guidelines is to minimize redundancy and duplication of GO term information.
>‘ribonucleotide-dependent catalytic activity’
>> ‘ATP-dependent catalytic activity’
This would follow previous terms such as:
GO:0016723 'oxidoreductase activity, oxidizing metal ions, NAD or NADP as acceptor',  as well as many specific terms e.g. 'GTP-dependent polynucleotide kinase activity', 'thymidylate synthase (FAD) activity', 'DNA ligase (ATP) activity', 'N-methylhydantoinase (ATP-hydrolyzing) activity').


This would at least mean that if any users had become accustomed to using the ATP binding annotation set to find those gene products that metabolised ATP, they could in future still gather together relevant gene products by using such a grouping term (and as a side benefit it could be helpful for curation consistency, if we could search for proteins co-annotated to 'ATP binding' and 'catalytic activity (ATP-hydrolysing)' then we would have reason to investigate further the validity of the 'ATP binding' annotation).
An enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, binding is implied by the molecular function GO term describing the activity of an enzyme or transporter. Consequently, it is redundant to annotate an enzyme or transporter with GO binding terms for each of its substrate/products, and curators should avoid making such redundant annotations.


== What would you like to see included or not included in the proposal for it to be acceptable? ==
There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.


* Ben Hitz
The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding directly, don’t use the binding term if the experiment shows catalysis, but not the specific binding activity. '''In cases where curators feel it is important to annotate to a binding term where catalytic activity has been shown, but no binding assays were performed, an IC (inferred by curator) evidence code should be used.''' IS THIS WHAT WE WANT?
* David Hill
<blockquote>
* Debbie Siegele
Peter D: I think it is not. Above, we explicitly discourage annotation of "binding" in cases where the data support a "catalysis" or "transport" annotation. This last sentence ("In cases where curators feel ...") appears to allow exactly the opposite. I would delete that sentence.
* Emily Dimmer
</blockquote>
* Jim Hu
 
* Midori Harris - This decision is one that annotators have to make. Once the annotators reach a consensus, the ontology editors will make any necessary changes to the function ontology. If there is a demand to make binding terms obsolete, we'll probably need annotators' input to determine which terms to retain.
Curators should use their judgment about when to associate an enzyme or transporter with a binding term for its substrates/products and also use their judgment to decide whether the interaction is physiologically relevant.  
* Mike Cherry
* Peter D'Eustachio
* Ruth Lovering - I like the idea of having additional parent terms for the enzyme GO terms which enable a more specific description about the activity of the enzyme/transporter, eg: ‘ribonucleotide-dependent catalytic activity’ or 'ATP-hydrolyzing' but which do not include the word 'binding'.


== Other aspects of the discussion to consider ==
'''Examples:'''
GO terms are not protein specific, therefore use of a binding term with a specific substrate/product may provide additional information not provided by the catalytic function alone. For example Rehemtulla et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. This information can be annotated as: GO:0004252 serine-type endopeptidase activity, GO:0051605 protein maturation by peptide bond cleavage but the addition of GO:0070678 preprotein binding along with the protein ID for von Willebrand factor/VWF/P04275 in column 16 would enable this additional information to be captured.


'''Use of experimentally verified binding to support catalytic activity annotation'''
<blockquote>
Peter D: Indeed the function term GO:0070678 "preprotein binding" exists, but perhaps it shouldn't. Its definition is, "Interacting selectively and non-covalently with a preprotein, the unprocessed form of a protein destined to undergo co- or post-translational processing," that is, binding as an explicit first step of catalysis ("destined to undergo ..."). If we accept the reasoning above, perhaps we should also recommend obsoletion of GO:0070678.


Ben to write comments in here.
More generally, isn't this proposed usage pushing GO in exactly the direction that Ben found unacceptable, of trying to be an exhaustive catalogue of concrete molecular interactions?
</blockquote>


Emily suggested:
Curator should use their judgment to decide how specific to make the description for the bound substrate/product. Curators should recognize that GO annotation should capture information relevant to the in vivo situation, not artificial substrates. For example, PMID: 17916063 describes the cleavage of synthetic peptides by SENP1/Q9P0U3. The peptide sequences were derived from several different SUMO sequences and therefore the following GO terms could be associated with SENP1: GO:0032183 SUMO binding (with protein IDs for SUMO1, P63165 and SUMO2, P61956, included in column 16), GO:0070139 SUMO-specific endopeptidase activity.
The paper, PMID: 10980193, partially characterizes a GTPase. It states that the protein is thought to be a GTPase, however this paper only measured its capacity to bind GTP, but not its GTPase activity directly. If this paper provided the only evidence for a possible GTPase activity, I would have considered annotating to the 'GTP binding' term, using the IDA evidence code, and referred to this annotation as providing supporting evidence for GTPase activity ('IC').
However, if we can no long annotate to GTP binding to proteins which use it as substrates, and assuming that this is the only evidence the curator can find to support an annotation to GTPase activity, how do curators feel about the alternative annotation to:


GTPase activity (GO:0003924) evidence='IPI' PMID:10980193 with='CHEBI:15422'
</blockquote>
Ruth: Sorry I have been a bit slow understanding the use of column 16 and have just realised that addition of the protein ID for von Willebrand factor/VWF/P04275 in column 16 in the annotation: GO:0051605 protein maturation by peptide bond cleavage and/or GO:0004252 serine-type endopeptidase activity would enable this more specific information to be included. I agree that if column 16 was used in this way the 'preprotein binding' wouldn't be required.  


Peter: not everything that binds GTP has GTPase activity so it would be wrong to create this annotation.
By the same token SUMO binding wouldn't be required if the SUMO Protein IDs were included in column 16 with the GO:0070139 SUMO-specific endopeptidase activity annotation.
</blockquote>


From the proposal it will not be appropriate to annotate to binding activity, such as 'GTP binding', if this is all that is known about the function of the protein and there is no known catalytic activity predicted from this binding.
=== September 4, 2009 ===


'''Examples of GTP/GDP/ATP/ADP binding where this binding does not lead to GTP/ATP hydrolysis'''
'''Avoid Redundant Binding Relationships For Substrates/Products'''


Midori: for ATP specifically, are there any activities that use it as a substrate, but don't hydrolyze one of the phosphodiester bonds (i.e. either ATP -> ADP + Pi or ATP -> AMP + PPi)?
The purpose of the binding term guidelines is to minimize redundancy and duplication of GO term information.


Jim: In the SOS response, activated RecA (i.e. RecA in the ATP-bound state) has several activities that are not dependent on ATP hydrolysis, including stimulating autocleavage of lambda repressor, LexA, and UmuD.  I believe that there are other examples where NTP binding is needed for an activity and hydrolysis is to flip the protein to the inactive state. So, even though G-proteins are GTPases, their signaling activities are not coupled to GTP hydrolysis.  Hydrolysis returns them to their OFF state.  
An enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, binding is implied by the molecular function GO term describing the activity of an enzyme or transporter. Consequently, it is redundant to annotate an enzyme or transporter with GO binding terms for each of its substrate/products, and curators should avoid making such redundant annotations.


Peter: GO already provides for these. See, for example GO:0030695 GTPase regulator activity and its children for the small GTPase case and attempting to capture exactly the guanine nucleotide-regulated switching behavior of eukaryotic small GTPases like RAS, RAN, RHO, etc.
There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.


Peter: Not all the functions of RAS require hydrolysis of GTP. While the entire life-cycle of the normal protein may well depend on binding AND hydrolysis, this life cycle comprises multiple molecular functions that are involved in multiple biological processes and any one function / process can involve only binding (or only hydrolysis or, probably, only exchange of bound for free nucleotide).
The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding directly, don’t use the binding term if the experiment shows catalysis, but not the specific binding activity.  


'''Cross product annotations'''
Curators should use their judgment about when to associate an enzyme or transporter with a binding term for its substrates/products and also use their judgment to decide whether the interaction is physiologically relevant. Curators should recognize that GO annotations should capture information relevant to the in vivo situation, not artificial substrates.


Jim: should substrate binding be used in cross product annotations (see [[Annotation_Cross_Products#binding_example]])
=== Not covered by the above draft, to be discussed at GOC===


'''Data interpretation and other databases'''
* Should we distinguish substrate binding from effector binding?


Peter: GO can't respond well to queries asking for "all ATP-binding proteins" or "all alcohol-binding proteins" because it's a naïve question, not a logical flaw or incompleteness in GO.
* The transfer of 'binding' term annotations via ISS/ISO


This kind of query is a natural fit to a data aggregation tool, and not something that any one database (and in particular in this case not GO) can or should answer alone. GO's role here is to provide words and logical structures to allow reliable linking among the various sources of the actual data.
* Should previous annotations to binding terms be left as they are, or should evidence codes be updated to make them be consistent with the above proposal?


Mike:  Where will the line be drawn for what to include?  If we say when there is proof that a GP is a kinase it should also have ATP binding as a function, how far do you go?  Will all reactants and products for all catalysts be added?  I see much of this is actually inferred details.  If kinase then binds phosphate, but there may not have been a direct assay of the binding.  I know someone will say oh we infer things all the time.  Yes, but should we extend this practice.  It seems there is more than effort to do than add this type of inferred information.
* The use of IC (inferred by curator) to enable curators to annotate to a binding term where catalytic activity has been shown, but no binding assays were performed.


Is there a single (or multiple) database which provides a source of this interaction information? 
* Can we guide curator judgement on the interpretation of the boundary between binding and catalysis or is there a legitimate hybrid boundary region?  
If so, is it being used as a dataset for microarray analysis tools? If not why not? 
If GO removes binding interaction data what impact will it make on the interpretation of experimental datasets?


== Other issues brought up in this discussion ==
*  Can / should the GO hierarchy be used to accommodate catalogues of specific molecules and their behaviors, if not by a core group of GO annotators then by collaborating groups? What if there were 40 distinct substrates identified, all physiologically relevant in some instance (or more likely, all tested in vitro and possibly physiologically relevant) is this full list going to be added to column 16?  As we accumulate more and more high-throughput data we are going to need a much better way of dealing with this. Can we develop a way to annotate the "process" relationships with the various "molecular functions".




This discussion also led to broader comments about the role of GO:  
'''Example 1:''' Rehemtulla et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. The following GO terms could potentially be used to capture this information:
Kinase substrate information should be stored in protein kinase substrate databases.
GO:0004252 serine-type endopeptidase activity
Receptor binding should just be thrown out altogether as a term, except for actual LIGANDS.
GO:0051605 protein maturation by peptide bond cleavage
G-protein / GPCR interactions should be stored in a signalling pathway/protein-protein interaction databases.
GO:0070678 preprotein binding  
The use of column 16 and the protein ID for von Willebrand factor/VWF/P04275, but with which GO terms?
'''Example 2:''' PMID: 17916063 describes the cleavage of synthetic peptides by SENP1/Q9P0U3. The peptide sequences were derived from several different SUMO sequences and therefore the following GO terms could be associated with SENP1:
GO:0032183 SUMO binding
GO:0070139 SUMO-specific endopeptidase activity
How specific should GO annotations be? Should column 16 be used to clarify this with protein IDs for SUMO1, P63165 and SUMO2, P61956?


== From the Documentation for the Function Ontology ==
== From the Documentation for the Function Ontology ==
Line 146: Line 153:
Ontology Development Action Items
Ontology Development Action Items
17. Document the fact that binding is not always a parent of enzyme. Binding is only a parent when stable binding occurs. Remove Binding as parent where appropriate.
17. Document the fact that binding is not always a parent of enzyme. Binding is only a parent when stable binding occurs. Remove Binding as parent where appropriate.
== Conference call ==
[[Binding Terms Conference Call Information]]
[[Binding Terms minutes June 09]]
[[Old version of working group wiki]]
== 2010 discussion ==
During the last GOC meeting many of the recommend suggestions were agreed ([http://wiki.geneontology.org/index.php/GOC_Meeting_Minutes_September_2009#Binding_Discussion.28Ruth.29 binding discussion] and [http://wiki.geneontology.org/index.php/GOC_Meeting_Minutes_September_2009#Binding_Discussion_Summary_.28Ruth.29 binding summary]).  We now need to write up these statements as GOC guidelines and also address some of the other binding issues which are still unresolved.  It would be good to be able to make progress on this before the next GOC meeting so we probably need to make start on this soon. 
During the GOC meeting discussion Michael pointed out that we could make the proposal much shorter, which seems sensible, so how about the following proposed guidelines, assuming that these guidelines will be posted at [http://www.geneontology.org/GO.annotation.conventions.shtml the GOC Annotation Conventions web page], or will they be available through the wiki?:
'''Annotating gene products that interact with other molecules'''
Binding is often implied by the molecular function GO term describing the activity of an enzyme, for example an enzyme MUST bind all of the substrates and products of the reaction it catalyzes.  Therefore, curators should aim to avoid redundant 'binding' relationship GO terms for substrates/products, if these result in the duplication of GO term information. For example do not annotate to 'ATP binding' if the gene product is already annotated to 'ATPase activity'.
   
There will be some cases, however, where it is appropriate to annotate to a binding term. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity, in such a case, it would be appropriate to annotate to ATP binding using IDA.
Curators should try to capture specifics as much as feasible and use their judgement about when to associate an enzyme with a binding term for it’s substrates/products and also use their judgement to decide whether the interaction is physiologically relevant. Curators should capture information relevant to the in vivo situation, not artifical substrates.
Curators should not use IC to annotate to a binding term based on the annotation to an enzyme activity, if this activity already implies the binding activity, eg should not create 'ATP binding' using the IC evidence code and associated 'with' the GO term 'ATPase activity'.
'''Comments from the Working group on the above draft guidelines''':
* Ben Hitz
* David Hill
* Debby Siegele - '''agree'''
* Emily Dimmer - '''agree'''
* Jim Hu
* Mike Cherry
* Peter D'Eustachio
* Ruth Lovering - '''agree'''
'''In addition to discussing the above guidelines the GOC clearly identified the following issues, please add comments to linked wiki pages:'''
1. There are some situations where a protein that acts as a transporter binds to a complex or a vesicle that it transports, it may well not bind to the transported substance directly. Therefore, the transporter aspect of the guidelines above have been removed.  Does anyone think that the 'binding' guidelines should suggest that transporters are in general not annotated to 'substrate binding' terms?
2. Should we capture drug information? often artifical substrates. Also Rex raised RGD binding as an example of an artificial substrate. Discussion on [[drug information]] page.
3. The inclusion of the use of column 16 here seems appropriate, although not discussed in binding discussion at GOC. So in the last paragraph include something like: column 16 can be used to increase the detail of the GO annotation. For example the specific substrate for an enzyme with 'GO:0032451 demethylase activity' can be included in column 16. Discussion on [[binding and column 16]] page. And protein binding [http://gocwiki.geneontology.org/index.php/Annotation_consistency:_x_protein_binding_and_with Annotation consistency]]
4. Should we consider having a term like ‘ATP binding involved in kinase activity'. Discussion on [[new binding GO terms]] page.
5. What, if any, binding term annotations can be transferred via ISS/ISO? Binding terms should be transferred by ISS, but whether to transfer the thing that was bound is controversial. ‘protein binding’ annotations become vague if we drop the ‘with’ field contents upon transfer, while more specific terms like ‘kinase binding’ are less vague when transferred even if we drop the ‘with’ field contents of the experimental annotation. Discussion on [[ISS binding terms]] page.
'''Finding an example of when to include a binding term, as well as an enyzme activity term.'''
It would be useful to have actual examples of when to use column 16 and when to use a 'binding' term.  The following example suggests possible use of column 16 as well as the use of a 'binding' term alongside an 'enzyme activity' term.
* Are any of these annotations redundant?
* What would be the most effective annotations to include?
Rehemtulla et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. This information can be annotated as:
{| {{Prettytable}} class='sortable'
|-
! Protein name and ID
! GO term ID and name
! Ontology
! Evidence code
! With
! Column 16
|-
| PCSK6 P29122
| GO:0004252 serine-type endopeptidase activity
| function
| IDA
| -
| VWF/P04275
|-
| PCSK6 P29122
| GO:0051605 protein maturation by peptide bond cleavage
| process
| IDA
| -
| VWF/P04275
|-
| PCSK6 P29122
| GO:0070678 preprotein binding
| function
| IPI
| VWF/P04275
| -
|-
|}
* The addition of GO:0070678 preprotein binding seems to give more information than achieved by just the GO:0004252 serine-type endopeptidase activity annotation with the protein ID in column 16. 
* However, GO:0070678 preprotein binding does seem redundant when made alongside the GO:0051605 protein maturation by peptide bond cleavage with the protein ID in column 16.
'''Please could we discuss these (along with any others that people want to raise) and bring suggestions on these to the next GOC meeting.'''
===GO Consortium meeting discussion ===
[[Binding Guidelines]]
===Binding Issues ===
[[N-terminal and C-terminal binding]]
[[enzyme binding]]

Latest revision as of 09:45, 12 April 2019

Working group

  • Ben Hitz
  • Harold Drabkin
  • Debby Siegele
  • Emily Dimmer
  • Jim Hu
  • Mike Cherry
  • Peter D'Eustachio
  • Ruth Lovering

Objectives of the binding terms working group are to provide draft guidelines with examples on the following:

1. What binding activities should be included in GO

2. The application of binding term usage in conjunction with column 16

3. The transfer of 'binding' term annotations via ISS/ISO


Proposed Guidelines:

July 28, 2009

Binding terms guidelines aim to minimize redundancy and duplication of information of GO term usage.

Enzymes MUST bind ALL of the substrates (and products) involved in a catalyzed reaction - there is no action at a distance. Therefore, during the annotation of an enzyme, it is not necessary to associated a list of GO binding terms describing all of the substrates and products, if this binding is implied by the GO term describing the catalytic function of the enzyme.

However, GO terms are not protein specific, therefore use of a binding term with a specific substrate/product may provide additional information not provided by the catalytic function alone. For example Rehemtulla et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. This information can be annotated as: GO:0004252 serine-type endopeptidase activity, GO:0051605 protein maturation by peptide bond cleavage but the addition of GO:0070678 preprotein binding along with the protein ID for von Willebrand factor/VWF/P04275 in column 16 would enable this additional information to be captured.

Curator should use their judgment to decide how specific to make the description for the bound substrate/product. Curators should recognize that GO annotation should capture information relevant to the in vivo situation, not artificial substrates. For example, PMID: 17916063 describes the cleavage of synthetic peptides by SENP1/Q9P0U3. The peptide sequences were derived from several different SUMO sequences and therefore the following GO terms could be associated with SENP1: GO:0032183 SUMO binding (with protein IDs for SUMO1, P63165 and SUMO2, P61956, included in column 16), GO:0070139 SUMO-specific endopeptidase activity.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding, don’t use the binding term if the experiment shows catalysis but not the specific binding activity.

Annotation of binding reactions is confounded by the complexity of assays and kinetics of ‘binding’ studies, therefore a curator should use their judgment to decide whether the interaction is physiologically relevant.

Proteins involved in transport should be annotated following the same guidelines described above for enzymes.

August 4, 2009

Avoid Redundant Binding Relationships For Substrates/Products

The purpose of the binding term guidelines is to minimize redundancy and duplication of GO term information.

An enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, binding is implied by the molecular function GO term describing the activity of an enzyme or transporter. Consequently, it is redundant to annotate an enzyme or transporter with GO binding terms for each of its substrate/products, and curators should avoid making such redundant annotations.

There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding directly, don’t use the binding term if the experiment shows catalysis, but not the specific binding activity. In cases where curators feel it is important to annotate to a binding term where catalytic activity has been shown, but no binding assays were performed, an IC (inferred by curator) evidence code should be used.

Curators should use their judgment about when to associate an enzyme or transporter with a binding term for its substrates/products.

September 2, 2009

Avoid Redundant Binding Relationships For Substrates/Products

The purpose of the binding term guidelines is to minimize redundancy and duplication of GO term information.

An enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, binding is implied by the molecular function GO term describing the activity of an enzyme or transporter. Consequently, it is redundant to annotate an enzyme or transporter with GO binding terms for each of its substrate/products, and curators should avoid making such redundant annotations.

There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding directly, don’t use the binding term if the experiment shows catalysis, but not the specific binding activity. In cases where curators feel it is important to annotate to a binding term where catalytic activity has been shown, but no binding assays were performed, an IC (inferred by curator) evidence code should be used. IS THIS WHAT WE WANT?

Peter D: I think it is not. Above, we explicitly discourage annotation of "binding" in cases where the data support a "catalysis" or "transport" annotation. This last sentence ("In cases where curators feel ...") appears to allow exactly the opposite. I would delete that sentence.

Curators should use their judgment about when to associate an enzyme or transporter with a binding term for its substrates/products and also use their judgment to decide whether the interaction is physiologically relevant.

Examples: GO terms are not protein specific, therefore use of a binding term with a specific substrate/product may provide additional information not provided by the catalytic function alone. For example Rehemtulla et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. This information can be annotated as: GO:0004252 serine-type endopeptidase activity, GO:0051605 protein maturation by peptide bond cleavage but the addition of GO:0070678 preprotein binding along with the protein ID for von Willebrand factor/VWF/P04275 in column 16 would enable this additional information to be captured.

Peter D: Indeed the function term GO:0070678 "preprotein binding" exists, but perhaps it shouldn't. Its definition is, "Interacting selectively and non-covalently with a preprotein, the unprocessed form of a protein destined to undergo co- or post-translational processing," that is, binding as an explicit first step of catalysis ("destined to undergo ..."). If we accept the reasoning above, perhaps we should also recommend obsoletion of GO:0070678.

More generally, isn't this proposed usage pushing GO in exactly the direction that Ben found unacceptable, of trying to be an exhaustive catalogue of concrete molecular interactions?

Curator should use their judgment to decide how specific to make the description for the bound substrate/product. Curators should recognize that GO annotation should capture information relevant to the in vivo situation, not artificial substrates. For example, PMID: 17916063 describes the cleavage of synthetic peptides by SENP1/Q9P0U3. The peptide sequences were derived from several different SUMO sequences and therefore the following GO terms could be associated with SENP1: GO:0032183 SUMO binding (with protein IDs for SUMO1, P63165 and SUMO2, P61956, included in column 16), GO:0070139 SUMO-specific endopeptidase activity.

Ruth: Sorry I have been a bit slow understanding the use of column 16 and have just realised that addition of the protein ID for von Willebrand factor/VWF/P04275 in column 16 in the annotation: GO:0051605 protein maturation by peptide bond cleavage and/or GO:0004252 serine-type endopeptidase activity would enable this more specific information to be included. I agree that if column 16 was used in this way the 'preprotein binding' wouldn't be required.

By the same token SUMO binding wouldn't be required if the SUMO Protein IDs were included in column 16 with the GO:0070139 SUMO-specific endopeptidase activity annotation.

September 4, 2009

Avoid Redundant Binding Relationships For Substrates/Products

The purpose of the binding term guidelines is to minimize redundancy and duplication of GO term information.

An enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, binding is implied by the molecular function GO term describing the activity of an enzyme or transporter. Consequently, it is redundant to annotate an enzyme or transporter with GO binding terms for each of its substrate/products, and curators should avoid making such redundant annotations.

There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding directly, don’t use the binding term if the experiment shows catalysis, but not the specific binding activity.

Curators should use their judgment about when to associate an enzyme or transporter with a binding term for its substrates/products and also use their judgment to decide whether the interaction is physiologically relevant. Curators should recognize that GO annotations should capture information relevant to the in vivo situation, not artificial substrates.

Not covered by the above draft, to be discussed at GOC

  • Should we distinguish substrate binding from effector binding?
  • The transfer of 'binding' term annotations via ISS/ISO
  • Should previous annotations to binding terms be left as they are, or should evidence codes be updated to make them be consistent with the above proposal?
  • The use of IC (inferred by curator) to enable curators to annotate to a binding term where catalytic activity has been shown, but no binding assays were performed.
  • Can we guide curator judgement on the interpretation of the boundary between binding and catalysis or is there a legitimate hybrid boundary region?
  • Can / should the GO hierarchy be used to accommodate catalogues of specific molecules and their behaviors, if not by a core group of GO annotators then by collaborating groups? What if there were 40 distinct substrates identified, all physiologically relevant in some instance (or more likely, all tested in vitro and possibly physiologically relevant) is this full list going to be added to column 16? As we accumulate more and more high-throughput data we are going to need a much better way of dealing with this. Can we develop a way to annotate the "process" relationships with the various "molecular functions".


Example 1: Rehemtulla et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. The following GO terms could potentially be used to capture this information: GO:0004252 serine-type endopeptidase activity GO:0051605 protein maturation by peptide bond cleavage GO:0070678 preprotein binding The use of column 16 and the protein ID for von Willebrand factor/VWF/P04275, but with which GO terms?

Example 2: PMID: 17916063 describes the cleavage of synthetic peptides by SENP1/Q9P0U3. The peptide sequences were derived from several different SUMO sequences and therefore the following GO terms could be associated with SENP1: GO:0032183 SUMO binding GO:0070139 SUMO-specific endopeptidase activity How specific should GO annotations be? Should column 16 be used to clarify this with protein IDs for SUMO1, P63165 and SUMO2, P61956?

From the Documentation for the Function Ontology

Binding guidelines

Avoid Binding Relationships

Catalytic activities should not be related to binding terms (see the September 2003 Bar Harbor GO meeting minutes); for example, ATPase activity should not be related to ATP binding. Similarly, there should not be a relationship between transporter terms and binding terms. Binding terms should only be used in cases where a stable binding interaction occurs. There are several reasons for this.

Firstly, transporter, catalysis and binding activities are all in the function ontology, which is used to describe elemental single step activities that occur at the macromolecular level. That means that if we were to further subdivide these functions - for example, splitting the catalysis of a reaction into steps such as "substrate binding", "formation of unstable intermediate" or "attraction of electrons to positive charge" - we would be saying that a reaction was actually a series of functions - i.e. a process. Additionally, we would be going beyond the scope of the molecular function ontology as we would be dealing with events on a molecular or atomic level.

Another reason is the sheer practicality of sorting through the 4000+ catalytic reactions we have in GO and deciding which of the substrates and products should be given 'binding' terms. Should we say that only substrates are bound by an enzyme? How about reversible reactions or cases where the reaction mechanism is unknown?

Finally, the GO binding terms are supposed to represent stable binding interactions, as opposed to the transient binding that occurs prior to catalysis. Hence there should not be a connection between stable binding and catalysis.

From the minutes of Bar Harbor GO Consortium Meeting 2003

BarHarbor minutes

Section 5) Ontology Development Issues

d) Consistency of Parentage (catalysis and binding) It was agreed that enzyme activities should have only the catalysis parent All binding parents to enzyme activities should be removed where appropriate.


Ontology Development Action Items 17. Document the fact that binding is not always a parent of enzyme. Binding is only a parent when stable binding occurs. Remove Binding as parent where appropriate.

Conference call

Binding Terms Conference Call Information

Binding Terms minutes June 09

Old version of working group wiki


2010 discussion

During the last GOC meeting many of the recommend suggestions were agreed (binding discussion and binding summary). We now need to write up these statements as GOC guidelines and also address some of the other binding issues which are still unresolved. It would be good to be able to make progress on this before the next GOC meeting so we probably need to make start on this soon.

During the GOC meeting discussion Michael pointed out that we could make the proposal much shorter, which seems sensible, so how about the following proposed guidelines, assuming that these guidelines will be posted at the GOC Annotation Conventions web page, or will they be available through the wiki?:

Annotating gene products that interact with other molecules

Binding is often implied by the molecular function GO term describing the activity of an enzyme, for example an enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Therefore, curators should aim to avoid redundant 'binding' relationship GO terms for substrates/products, if these result in the duplication of GO term information. For example do not annotate to 'ATP binding' if the gene product is already annotated to 'ATPase activity'.

There will be some cases, however, where it is appropriate to annotate to a binding term. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity, in such a case, it would be appropriate to annotate to ATP binding using IDA.

Curators should try to capture specifics as much as feasible and use their judgement about when to associate an enzyme with a binding term for it’s substrates/products and also use their judgement to decide whether the interaction is physiologically relevant. Curators should capture information relevant to the in vivo situation, not artifical substrates.

Curators should not use IC to annotate to a binding term based on the annotation to an enzyme activity, if this activity already implies the binding activity, eg should not create 'ATP binding' using the IC evidence code and associated 'with' the GO term 'ATPase activity'.

Comments from the Working group on the above draft guidelines:

  • Ben Hitz
  • David Hill
  • Debby Siegele - agree
  • Emily Dimmer - agree
  • Jim Hu
  • Mike Cherry
  • Peter D'Eustachio
  • Ruth Lovering - agree


In addition to discussing the above guidelines the GOC clearly identified the following issues, please add comments to linked wiki pages:

1. There are some situations where a protein that acts as a transporter binds to a complex or a vesicle that it transports, it may well not bind to the transported substance directly. Therefore, the transporter aspect of the guidelines above have been removed. Does anyone think that the 'binding' guidelines should suggest that transporters are in general not annotated to 'substrate binding' terms?

2. Should we capture drug information? often artifical substrates. Also Rex raised RGD binding as an example of an artificial substrate. Discussion on drug information page.

3. The inclusion of the use of column 16 here seems appropriate, although not discussed in binding discussion at GOC. So in the last paragraph include something like: column 16 can be used to increase the detail of the GO annotation. For example the specific substrate for an enzyme with 'GO:0032451 demethylase activity' can be included in column 16. Discussion on binding and column 16 page. And protein binding Annotation consistency]

4. Should we consider having a term like ‘ATP binding involved in kinase activity'. Discussion on new binding GO terms page.

5. What, if any, binding term annotations can be transferred via ISS/ISO? Binding terms should be transferred by ISS, but whether to transfer the thing that was bound is controversial. ‘protein binding’ annotations become vague if we drop the ‘with’ field contents upon transfer, while more specific terms like ‘kinase binding’ are less vague when transferred even if we drop the ‘with’ field contents of the experimental annotation. Discussion on ISS binding terms page.

Finding an example of when to include a binding term, as well as an enyzme activity term.

It would be useful to have actual examples of when to use column 16 and when to use a 'binding' term. The following example suggests possible use of column 16 as well as the use of a 'binding' term alongside an 'enzyme activity' term.

  • Are any of these annotations redundant?
  • What would be the most effective annotations to include?

Rehemtulla et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. This information can be annotated as:

Protein name and ID GO term ID and name Ontology Evidence code With Column 16
PCSK6 P29122 GO:0004252 serine-type endopeptidase activity function IDA - VWF/P04275
PCSK6 P29122 GO:0051605 protein maturation by peptide bond cleavage process IDA - VWF/P04275
PCSK6 P29122 GO:0070678 preprotein binding function IPI VWF/P04275 -
  • The addition of GO:0070678 preprotein binding seems to give more information than achieved by just the GO:0004252 serine-type endopeptidase activity annotation with the protein ID in column 16.
  • However, GO:0070678 preprotein binding does seem redundant when made alongside the GO:0051605 protein maturation by peptide bond cleavage with the protein ID in column 16.


Please could we discuss these (along with any others that people want to raise) and bring suggestions on these to the next GOC meeting.


GO Consortium meeting discussion

Binding Guidelines

Binding Issues

N-terminal and C-terminal binding

enzyme binding