Binding Terms Conference Call Information
What problems are we trying to solve?
This issue was originally brought up in the GOC meeting in Oregon [binding minutes]
This meeting identified that
- The documentation is confusing on the proper use of binding
- There were conflicting views about whether or not GO should include catalytic substrate annotations such as 'ATP binding' and the problem of including both substrate and product from a catalytic reaction.
- Most people agreed that GO should capture non-transformative binding, eg. binding of X resulting in an allosteric change to the thing doing the binding.
- Perhaps cross product annotations should be used to describe majority of binding annotations (see Annotation_Cross_Products#binding_example)
- There was a concern about how limiting 'binding' annotations to non-catalytic interactions may affect queries for genes involved in 'ATP binding', for example, researchers might reasonably expect to get back kinases by such a query.
- It was unclear whether there should be a transfer of 'binding' term annotations via ISS/ISO
ACTION ITEMS: Peter (lead), Ruth, Debbie, Jim form a working group to examine the issues raised in the discussion. Should GO capture catalytic binding? Mike, Ben, Emily, David also joined this working group.
This survey was written to address the issue: Should GO capture catalytic binding?
The way we capture catalytic binding may change the decision on whether or not we should capture catalytic binding. However, this is a bit of a chicken and an egg situation as deciding that catalytic binding should not be captured at all will mean that discussions about how to capture catalytic binding would be irrelevant.
Several people made comments while completing the survey and these are available: Binding Terms Survey Comments
Comments from Debby
The current binding terms discussion started in response to the issue of consistency of annotation among curators in the use of binding terms. Some groups are using binding terms to annotate substrate binding for enzymes and transport proteins and others aren't. I think that the major source of confusion and inconsistency is that for most curators it makes sense to annotate that an enzyme or transporter binds its substrate. From my point of view as a curator, my recommendations would be to allow GO:0005488 "binding" to be used for substrate binding, but discourage curators from using it for this purpose. This can be achieved by making the GO documentation on binding terms clearer, rewriting GO term definitions, adding appropriate usage suggestions to Amigo, and remove usage suggestions that suggest annotating a catalytic or transport activity to a binding term.
I think that this approach would answer the concerns expressed about deleting information from GO. It would also make it easier to deal with situations such as the one Emily raised (re PMID:10980193) where binding of GTP has been experimentally determined, but GTPase activity, while likely, is still only predicted based on amino acid sequence similarity. There should probably also be guidelines that restrict the creation of new child binding terms, which is another way of educating curators about correct usage. Personally, I think it makes sense to capture the identity of whatever is being bound by using column 16 to hold the CHEBI or UniProt ID, but this may be a different, altho' related issue.
Comments from Ben
I think there is a fundamental difference in how some (annotation) groups view GO. I (and curators at SGD share this opinion) feel that one should not use the GO to attempt to annotate any and all information about gene products that appear in any given literature reference. It is simply a ridiculous task for the GOC to attempt this, and I feel that this is very clearly illustrated in the "divide" regarding Binding terms.
I feel that while every little datum is certainly important, and deserves to be captured _somewhere_, adding a GO term annotation for it is often inappropriate, and generally harmful to the homogeneity of GO annotations. The GOC has always, for better or worse, allowed individual MODs and curation groups to decide what and how to curate, and while this flexibility is often warranted, based on the scope and depth of the literature (for a given organism or group of organisms) it also leads to large discrepancies in annotation practice and makes comparing cross-organism GO information "fun", to say the least.
That being said - I feel the following classes of information should NOT NEVER NO HOW be captured in GO. When they need to be captured, they should be incorporated into existing or even new database entities and cross-referenced via unique protein or gene product identifiers (e.g, UniProt ID).
- Stoichometric information - how many monomers are in a biologically functional unit
- Including all related "self binding" and "homodimerization" terms
- Protein-Protein interactions - i.e, "protein binding" with - these should be submitted or indexed in BioGrid and/or IntAct.
- physical constants
- sequence (this is of course stored and cross-referenced in godb and amigo)
- evolutionary data (ditto, re: ref-genome data)
- lists of substrates and cofactors, including allosteric interactions aka "nontransformative binding" - i.e, "X binding"
You should note that this list pretty much rules out MOST "binding" terms used for annotations. There are a few exceptions, which I call "terminal" binding that are acceptable. These are types of molecular functions where, to all available knowledge, the *purpose* of the gene product is to sequester, hold, or sense some other small molecule. Canonical example is Calmodulin, which I will grant is reasonable to annotate to MF: Calcium binding.
The last grey area remaining are situations where an experiment demonstrates binding to "X" and possibly infers some catalytic activity thereof, but does not demonstrate it. I agree that "partial" information is tricky to deal with. If we can denote it in a sensible way, i.e, "this gene product has the property 'ATP binding' (via IDA) but the purpose of that binding is currently Unknown" I would be in favor of using binding terms in this way.
I don't really feel strongly about "removing" the terms themselves, we have many terms "Cell Part" which are not to be used for annotations. However, if curators cannot resist using a term "because it's there" then maybe we should remove them. I do feel that the "stoichometric" terms should all be deleted.