Binding terms working group

From GO Wiki
Jump to navigation Jump to search

Working group

  • Ben Hitz
  • David Hill
  • Debby Siegele
  • Emily Dimmer
  • Jim Hu
  • Mike Cherry
  • Peter D'Eustachio
  • Ruth Lovering


Objectives of the binding terms working group are to provide draft guidelines with examples on the following:

1. What binding activities should be included in GO

2. The application of binding term usage in conjunction with column 16

3. The transfer of 'binding' term annotations via ISS/ISO


Proposed Guidelines:

July 28, 2009

Binding terms guidelines aim to minimize redundancy and duplication of information of GO term usage.

Enzymes MUST bind ALL of the substrates (and products) involved in a catalyzed reaction - there is no action at a distance. Therefore, during the annotation of an enzyme, it is not necessary to associated a list of GO binding terms describing all of the substrates and products, if this binding is implied by the GO term describing the catalytic function of the enzyme.

However, GO terms are not protein specific, therefore use of a binding term with a specific substrate/product may provide additional information not provided by the catalytic function alone. For example Rehemtulla et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. This information can be annotated as: GO:0004252 serine-type endopeptidase activity, GO:0051605 protein maturation by peptide bond cleavage but the addition of GO:0070678 preprotein binding along with the protein ID for von Willebrand factor/VWF/P04275 in column 16 would enable this additional information to be captured.

Curator should use their judgment to decide how specific to make the description for the bound substrate/product. Curators should recognize that GO annotation should capture information relevant to the in vivo situation, not artificial substrates. For example, PMID: 17916063 describes the cleavage of synthetic peptides by SENP1/Q9P0U3. The peptide sequences were derived from several different SUMO sequences and therefore the following GO terms could be associated with SENP1: GO:0032183 SUMO binding (with protein IDs for SUMO1, P63165 and SUMO2, P61956, included in column 16), GO:0070139 SUMO-specific endopeptidase activity.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding, don’t use the binding term if the experiment shows catalysis but not the specific binding activity.

Annotation of binding reactions is confounded by the complexity of assays and kinetics of ‘binding’ studies, therefore a curator should use their judgment to decide whether the interaction is physiologically relevant.

Proteins involved in transport should be annotated following the same guidelines described above for enzymes.

August 4, 2009

Avoid Redundant Binding Relationships For Substrates/Products

The purpose of the binding term guidelines is to minimize redundancy and duplication of GO term information.

An enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, binding is implied by the molecular function GO term describing the activity of an enzyme or transporter. Consequently, it is redundant to annotate an enzyme or transporter with GO binding terms for each of its substrate/products, and curators should avoid making such redundant annotations.

There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding directly, don’t use the binding term if the experiment shows catalysis, but not the specific binding activity. In cases where curators feel it is important to annotate to a binding term where catalytic activity has been shown, but no binding assays were performed, an IC (inferred by curator) evidence code should be used.

Curators should use their judgment about when to associate an enzyme or transporter with a binding term for its substrates/products.

September 2, 2009

Avoid Redundant Binding Relationships For Substrates/Products

The purpose of the binding term guidelines is to minimize redundancy and duplication of GO term information.

An enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, binding is implied by the molecular function GO term describing the activity of an enzyme or transporter. Consequently, it is redundant to annotate an enzyme or transporter with GO binding terms for each of its substrate/products, and curators should avoid making such redundant annotations.

There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding directly, don’t use the binding term if the experiment shows catalysis, but not the specific binding activity. In cases where curators feel it is important to annotate to a binding term where catalytic activity has been shown, but no binding assays were performed, an IC (inferred by curator) evidence code should be used. IS THIS WHAT WE WANT?

Peter D: I think it is not. Above, we explicitly discourage annotation of "binding" in cases where the data support a "catalysis" or "transport" annotation. This last sentence ("In cases where curators feel ...") appears to allow exactly the opposite. I would delete that sentence.

Curators should use their judgment about when to associate an enzyme or transporter with a binding term for its substrates/products and also use their judgment to decide whether the interaction is physiologically relevant.

Examples: GO terms are not protein specific, therefore use of a binding term with a specific substrate/product may provide additional information not provided by the catalytic function alone. For example Rehemtulla et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. This information can be annotated as: GO:0004252 serine-type endopeptidase activity, GO:0051605 protein maturation by peptide bond cleavage but the addition of GO:0070678 preprotein binding along with the protein ID for von Willebrand factor/VWF/P04275 in column 16 would enable this additional information to be captured.

Peter D: Indeed the function term GO:0070678 "preprotein binding" exists, but perhaps it shouldn't. Its definition is, "Interacting selectively and non-covalently with a preprotein, the unprocessed form of a protein destined to undergo co- or post-translational processing," that is, binding as an explicit first step of catalysis ("destined to undergo ..."). If we accept the reasoning above, perhaps we should also recommend obsoletion of GO:0070678.

More generally, isn't this proposed usage pushing GO in exactly the direction that Ben found unacceptable, of trying to be an exhaustive catalogue of concrete molecular interactions?

Curator should use their judgment to decide how specific to make the description for the bound substrate/product. Curators should recognize that GO annotation should capture information relevant to the in vivo situation, not artificial substrates. For example, PMID: 17916063 describes the cleavage of synthetic peptides by SENP1/Q9P0U3. The peptide sequences were derived from several different SUMO sequences and therefore the following GO terms could be associated with SENP1: GO:0032183 SUMO binding (with protein IDs for SUMO1, P63165 and SUMO2, P61956, included in column 16), GO:0070139 SUMO-specific endopeptidase activity.

Ruth: Sorry I have been a bit slow understanding the use of column 16 and have just realised that addition of the protein ID for von Willebrand factor/VWF/P04275 in column 16 in the annotation: GO:0051605 protein maturation by peptide bond cleavage and/or GO:0004252 serine-type endopeptidase activity would enable this more specific information to be included. I agree that if column 16 was used in this way the 'preprotein binding' wouldn't be required.

By the same token SUMO binding wouldn't be required if the SUMO Protein IDs were included in column 16 with the GO:0070139 SUMO-specific endopeptidase activity annotation.

September 4, 2009

Avoid Redundant Binding Relationships For Substrates/Products

The purpose of the binding term guidelines is to minimize redundancy and duplication of GO term information.

An enzyme MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a transporter MUST bind the molecules it transports. Therefore, binding is implied by the molecular function GO term describing the activity of an enzyme or transporter. Consequently, it is redundant to annotate an enzyme or transporter with GO binding terms for each of its substrate/products, and curators should avoid making such redundant annotations.

There will be some cases, however, where it is appropriate to annotate a binding relationship. For example, published experiments may show that a gene product binds a non-hydrolyzable ATP analog, without demonstrating that it has ATPase activity. In such a case, it would be appropriate to annotate to GO:0005524 ATP binding using an IDA evidence code.

The GO is committed to ‘annotating to the experiment’. Therefore the curator should try to capture the specifics as much as feasible; use the binding term if the experiment shows binding directly, don’t use the binding term if the experiment shows catalysis, but not the specific binding activity.

Curators should use their judgment about when to associate an enzyme or transporter with a binding term for its substrates/products and also use their judgment to decide whether the interaction is physiologically relevant. Curators should recognize that GO annotations should capture information relevant to the in vivo situation, not artificial substrates.

Not covered by the above draft, to be discussed at GOC

  • Should we distinguish substrate binding from effector binding?
  • The transfer of 'binding' term annotations via ISS/ISO
  • Should previous annotations to binding terms be left as they are, or should evidence codes be updated to make them be consistent with the above proposal?
  • The use of IC (inferred by curator) to enable curators to annotate to a binding term where catalytic activity has been shown, but no binding assays were performed.
  • Can we guide curator judgement on the interpretation of the boundary between binding and catalysis or is there a legitimate hybrid boundary region?
  • Can / should the GO hierarchy be used to accommodate catalogues of specific molecules and their behaviors, if not by a core group of GO annotators then by collaborating groups? What if there were 40 distinct substrates identified, all physiologically relevant in some instance (or more likely, all tested in vitro and possibly physiologically relevant) is this full list going to be added to column 16? As we accumulate more and more high-throughput data we are going to need a much better way of dealing with this. Can we develop a way to annotate the "process" relationships with the various "molecular functions".


Example 1: Rehemtulla et al. PMID: 8218226 describes the cleavage of pro-von Willebrand factor to mature von Willebrand factor by PACE4/PCSK6/P29122. The following GO terms could potentially be used to capture this information: GO:0004252 serine-type endopeptidase activity GO:0051605 protein maturation by peptide bond cleavage GO:0070678 preprotein binding The use of column 16 and the protein ID for von Willebrand factor/VWF/P04275, but with which GO terms?

Example 2: PMID: 17916063 describes the cleavage of synthetic peptides by SENP1/Q9P0U3. The peptide sequences were derived from several different SUMO sequences and therefore the following GO terms could be associated with SENP1: GO:0032183 SUMO binding GO:0070139 SUMO-specific endopeptidase activity How specific should GO annotations be? Should column 16 be used to clarify this with protein IDs for SUMO1, P63165 and SUMO2, P61956?

From the Documentation for the Function Ontology

Binding guidelines

Avoid Binding Relationships

Catalytic activities should not be related to binding terms (see the September 2003 Bar Harbor GO meeting minutes); for example, ATPase activity should not be related to ATP binding. Similarly, there should not be a relationship between transporter terms and binding terms. Binding terms should only be used in cases where a stable binding interaction occurs. There are several reasons for this.

Firstly, transporter, catalysis and binding activities are all in the function ontology, which is used to describe elemental single step activities that occur at the macromolecular level. That means that if we were to further subdivide these functions - for example, splitting the catalysis of a reaction into steps such as "substrate binding", "formation of unstable intermediate" or "attraction of electrons to positive charge" - we would be saying that a reaction was actually a series of functions - i.e. a process. Additionally, we would be going beyond the scope of the molecular function ontology as we would be dealing with events on a molecular or atomic level.

Another reason is the sheer practicality of sorting through the 4000+ catalytic reactions we have in GO and deciding which of the substrates and products should be given 'binding' terms. Should we say that only substrates are bound by an enzyme? How about reversible reactions or cases where the reaction mechanism is unknown?

Finally, the GO binding terms are supposed to represent stable binding interactions, as opposed to the transient binding that occurs prior to catalysis. Hence there should not be a connection between stable binding and catalysis.

From the minutes of Bar Harbor GO Consortium Meeting 2003

BarHarbor minutes

Section 5) Ontology Development Issues

d) Consistency of Parentage (catalysis and binding) It was agreed that enzyme activities should have only the catalysis parent All binding parents to enzyme activities should be removed where appropriate.


Ontology Development Action Items 17. Document the fact that binding is not always a parent of enzyme. Binding is only a parent when stable binding occurs. Remove Binding as parent where appropriate.

Conference call

Binding Terms Conference Call Information

Binding Terms minutes June 09

Old version of working group wiki


GOC meeting September 2009, Cambridge

binding discussion

binding summary