Binding Terms Conference Call Information
What problems are we trying to solve?
This issue was originally brought up in the GOC meeting in Oregon [binding minutes]
This meeting identified that
- The documentation is confusing on the proper use of binding
- There were conflicting views about whether or not GO should include catalytic substrate annotations such as 'ATP binding' and the problem of including both substrate and product from a catalytic reaction.
- Most people agreed that GO should capture non-transformative binding, eg. binding of X resulting in an allosteric change to the thing doing the binding.
- Perhaps cross product annotations should be used to describe majority of binding annotations (see Annotation_Cross_Products#binding_example)
- There was a concern about how limiting 'binding' annotations to non-catalytic interactions may affect queries for genes involved in 'ATP binding', for example, researchers might reasonably expect to get back kinases by such a query.
- It was unclear whether there should be a transfer of 'binding' term annotations via ISS/ISO
ACTION ITEMS: Peter (lead), Ruth, Debby, Jim form a working group to examine the issues raised in the discussion. Should GO capture catalytic binding? Mike, Ben, Emily, David also joined this working group.
This survey was written to address the issue: Should GO capture catalytic binding?
The way we capture catalytic binding may change the decision on whether or not we should capture catalytic binding. However, this is a bit of a chicken and an egg situation as deciding that catalytic binding should not be captured at all will mean that discussions about how to capture catalytic binding would be irrelevant.
For more information/comments about options to capture catalytic binding please see:
- working group wiki
- Jim's cross-product approach.
- Ben's comments on binding terms
- Binding Terms Survey Comments
Comments from Judy
Here are some ‘go-top’ type thoughts on this, but this also incorporates the GO@MGI specific thoughts as well. I look forward to a considerate discussion.
1. support everyone who have sought to clarify the binding discussion; update documentation,
2. recognize that the binding discussion is confounded by complexity of assays and kinetics of ‘binding’ studies,
3. commit to ‘annotating to the experiment’ meaning trying to capture the specifics as much as feasible; use the binding term if the experiment shows binding, don’t use if experiment shows catalysis but not specific binding the gene product,
4. reaffirm our commitment to provide practical and useful data to enhance the work of biologists,
5. remember that ‘consistency’ is a principle, not a rule; there will be exceptions: here what is needed is a statement of standard, such as provided in ‘3’. The goal is to have ‘correct’ annotations, not that everyone should have the same annotation,
6. consider that if existing annotations (~8,000?) involving binding terms are not wrong, then we should strongly consider keeping them,
7. and, confirm that any global change will need approval by GO-top
Comments from Debby
The current binding terms discussion started in response to the issue of consistency of annotation among curators in the use of binding terms. Some groups are using binding terms to annotate substrate binding for enzymes and transport proteins and others aren't. I think that the major source of confusion and inconsistency is that for most curators it makes sense to annotate that an enzyme or transporter binds its substrate. From my point of view as a curator, my recommendations would be to allow GO:0005488 "binding" to be used for substrate binding, but discourage curators from using it for this purpose. This can be achieved by making the GO documentation on binding terms clearer, rewriting GO term definitions, adding appropriate usage suggestions to Amigo, and remove usage suggestions that suggest annotating a catalytic or transport activity to a binding term.
I think that this approach would answer the concerns expressed about deleting information from GO. It would also make it easier to deal with situations such as the one Emily raised (re PMID:10980193) where binding of GTP has been experimentally determined, but GTPase activity, while likely, is still only predicted based on amino acid sequence similarity. There should probably also be guidelines that restrict the creation of new child binding terms, which is another way of educating curators about correct usage. Personally, I think it makes sense to capture the identity of whatever is being bound by using column 16 to hold the CHEBI or UniProt ID, but this may be a different, altho' related issue.
Ruth asked me to write a few lines to say what will be the advantage of keeping "substrate binding terms" or deleting them. I think what I wrote in section (1) below does that. In addition, after reviewing the discussions we've had and the various points that have been made I tried to pull out the individual questions and organize them in a hierarchical fashion.
1) Should GO capture substrate binding information?
The biggest advantage of answering "Yes" is that is consistent with current practice. Amigo lists 45,149 genes annotated to GO:0005488 and many of these annotations are to substrate binding. I think this indicates we are capturing information that curators and users think is important.
One disadvantage of "No" is that all the binding annotations in Amigo would have to be reviewed and incorrect annotations deleted. But the biggest disadvantage is that I think that annotators will continue to annotate substrate binding as they have in the past. It has been GO policy to NOT annotate substrate binding since 2003, but many curators continue to do it, and new GO terms for substrates continue to be created. In this particular case, I think the GO consortium policy needs to be changed to fit with curator practice.
2) If Yes, then how should substrate binding be captured?
- 2a) Continue to use GO:0005488 as it is currently defined ("The selective, often stoichiometric, interaction of a molecule with one or more specific sites on another molecule."), which includes all types of binding, including catalytic domain substrate binding.
- 2b) In addition to GO:0005488, which would still include any type of binding, create specific binding terms for substrates and effectors that could be used when there is evidence that the molecule being bound fits into one of these categories.
The advantage of 2a is that it is consistent with current practice and is simple. It doesn't require the creation of any new terms.
The advantage of 2b is that it adds information that is relevant to function. There are cases where the sole molecular function of a protein is to regulate the molecular function of a different protein in response to effector binding. In cases where the information in a paper doesn't indicate whether the molecule being bound is a substrate or an effector, then it would still be OK to annotate to G0:0005488 or one its child terms and the information wouldn't be incorrect.
3) If we decide to allow GO:0005488 to continue being used for catalytic domain binding, curators will want to indicate what is bound. How should the identity of the binding molecule be indicated?
- 3a) With child terms?
- 3b) With a CHEBI or Uniprot ID associated with the annotation?
The advantage of 3a is that it continues current practice. The disadvantages of 3a are the need to request new child terms and that curators will have to sort through an ever growing number of child terms (which I personally dislike having to do).
The advantages of 3b are that it doesn't require requesting new child terms and I personally find it easier to look up a CHEBI or Uniprot ID than sort through hundreds of child terms.
Comments from Ben
Plus additional comments
I think there is a fundamental difference in how some (annotation) groups view GO. I (and curators at SGD share this opinion) feel that one should not use the GO to attempt to annotate any and all information about gene products that appear in any given literature reference. It is simply a ridiculous task for the GOC to attempt this, and I feel that this is very clearly illustrated in the "divide" regarding Binding terms.
I feel that while every little datum is certainly important, and deserves to be captured _somewhere_, adding a GO term annotation for it is often inappropriate, and generally harmful to the homogeneity of GO annotations. The GOC has always, for better or worse, allowed individual MODs and curation groups to decide what and how to curate, and while this flexibility is often warranted, based on the scope and depth of the literature (for a given organism or group of organisms) it also leads to large discrepancies in annotation practice and makes comparing cross-organism GO information "fun", to say the least.
That being said - I feel that lists of substrates and cofactors, including allosteric interactions aka "nontransformative binding" - i.e, "X binding" should NOT be captured in GO.
The last grey areas remaining are situations where an experiment demonstrates binding to "X" and possibly infers some catalytic activity thereof, but does not demonstrate it. I agree that "partial" information is tricky to deal with. If we can denote it in a sensible way, i.e, "this gene product has the property 'ATP binding' (via IDA) but the purpose of that binding is currently Unknown" I would be in favor of using binding terms in this way.
The advantage of removing 'substrate binding terms' would be to:
- enable GO curators to concentrate on annotations which are unique to GO rather than those which are covered by other databases.
- reduce the inconstancies that exist between different databases and different annotation procedures.
Comment from Peter
... so annotations like these should not be allowed? Both the computational ones and the manual one?
I don't really think they should be allowed, although they are not terrible. The manual one is like a "this is all we know" from an x-ray structure of an unknown protein. A more useful annotation (not using allowed terms is) "We still don't know, but it binds pyridoxal phosphate". This is an example of curators trying to be helpful in giving information on a single gene, but perhaps not thinking about the big picture.
The automated ones I could take or leave. I don't think they contribute anything to our understanding of biology (although in the sense of "biochemistry" they do), I do think the information can be valid and useful - but it's available from interpro.
The bottom line is - are we just trying to describe "all interesting properties of gene products" or are we trying to present a coherent understanding of biology.
Comments from Jim
I'm vacillating on the central question... see the article history if you want to see me flip-flop!
- Focusing on the question at hand: Should GO capture catalytic binding?. I am sympathetic to Ben's reasons for why we should not. But my position before this edit was that because curators will do it anyway as long as there are "x binding" terms in the ontology, we should maintain the recommendation against capturing substrate binding but not expend much energy enforcing that policy. Debby points out to me that this is a cop-out similar to "Don't ask, don't tell", so I'm rethinking my position.
- My primary interest remains proliferation, not substrates. Chris disagrees with my concerns about "x binding/transport/carrier/whatever" terms, but he has not convinced me that we should precompose whenever anyone wants a GO term. In my mind, the binding discussion is analogous to the statement in Annotation_Cross_Products#Regulation_of_expression_and_specific_gene_products:
The GO will never pre-coordinate terms such as:
- regulation of oskar mRNA translation
- regulation of oskar mRNA transcription
- vs. these 6 terms. But that is a discussion for another day.