Binding Terms Survey Comments

Back to binding terms conference call wiki

Please note that these comments are added manually and therefore if additional modifications are made to the comments field of the survey please email r.lovering@ucl.ac.uk providing something from the text of the question before modification and the text after so that the new comment can be added below.

Question 1

The GO consortium should remove all annotations which describe catalytic domain substrate binding, associated with catalytic activities or transport proteins, and no longer annotate these interactions.

Comments:

Sympathetic to the proposal but still not completely decided. On the wiki page the question was asked: "Are there cases where a user would want to search for gene products that bind a particular substrate but have different molecular functions? " One answer to this springs to mind, the ubiquitin conugating E2 enzymes. Many of these proteins have lost the catalytic function yet nevertheless bind ubiquitin. For instance, hMMS2/UBE2V2 (catalytically inactive) forms a heterodimer with UBC13 (catalytically active). hMMS2/UBE2V2 binds the donor ubiqutin and orients the ubiquitin lys-63 toward the acceptor site in UBC13. In this case we would have to annotate one member of the heterodimer as ubiquitin-binding and the other one as not doing so, as this subunit is also catalytically active. While I think you can justify this, many users would probably find this confusing. Another example are the argonaute proteins, EIF2C1/AGO1 EIF2C2/AGO2, EIF2C3/AGO3 and EIF2C4/AGO4. All 4 of these proteins bind to an siRNA or miRNA, which serves as a guide for complementary mRNAs. In the case of AGO1, 3 and 4, the bound complementary mRNA is subject to translational repression. Therefore for these 3 proteins you would annotate to mRNA binding as the mRNA is not really a substrate and they lack any kind of catalytic activity. AGO2 on the other hand does have a catalytic activity but also acts like AGO1/3/4. For AGO2 the mRNA is either subject to translational repression (when the miRNA exhibits imperfect complementarity), or subject to endonucleolytic cleavage by AGO2 (generally when the complementarity is perfect - see GO:0070551). So AGO2 could be annotated to mRNA binding but only for papers that demonstrated translational repression, not cleavage. Again I think users would probably find this confusing.
I agree that if there is evidence supporting a catalytic activity then related binding annotations are not useful. However, there are many cases where for example a suspected GTPase has been demonstrated to bind GTP, but has not been shown to catalyze the hydrolysis, in which case it becomes GTP binding or nothing. So I would say my vote is a qualified yes, its a good idea but what would we do in cases like this?
I like Jim's suggestion of retaining high level binding terms. Realistically, we are not going to have time to revaluate every binding annotation so by moving things to a generic binding term as least we don't lose the functional info and the retrofit is feasible. However, given that we are not set up for col 16 due to database constraints and also our curators are likely resist the additional burden of having to figure out the appropriate chebi IDs, I suspect there will be a loss of info in terms of what the interacting molecules are for new annotations.
This is entire survey too binary - in principle, I agree we need to limit the scope of the "binding" terms but keep in mind that there are biological functional "binding" functions.
If i have evidence for GTPase activity, I would annotate it to just that term. Dual annotation doesn't make sense. But if I have evidence, say based on some domain comparison, that the gp has ATP binding domain, then I would annotate to just ATP binding with ISS
Annotate at the level of what function is shown in the experiment. If the experiment shows GTPase activity, then annotate to GTPase activity. If it shows GTP binding, then annotate to GTP binding.
Terms should be annotated to binding terms only if there was an experiment that demonstrated binding. There should not be coannotation for substrate binding if an experiment demonstrates catalytic activity. Instead the catalytic annotation should be made.
I disagree, but one should only annotate to binding if binding is directly demonstrated.
I dissagree with the proposal. If paper has experiment for a kinase that demonstrates ATP binding, then we should annotate it. There must be an experiment since the annotation would be an IDA. What if an unknown protein is shown to bind DNA; then 5 years later, it is shown to have helicase activity. It would be counterproductive to then have to remove the first annotation.
I agree that implicit annotations from an enzymatic function to a substrate binding function should not be made, however if there was an explicit experiment that showed the gene product bound to some molecule, then I think we should make the annotation.

Question 2

Change the definition of 'x binding' terms to explicitly exclude catalytic domain substrate binding. Comments:

If the binding information is already implicit from the catalytic activity, the binding annotations would be redundant, and therefore not worth the 1s and 0s they're encoded as.
Allow annotation to binding terms related to catalytic domain if the catalytic activity has not yet been experimentally determined for that protein.
Describing only non-substrate binding (more stable) interactions appears to me to be too limited as the info of catalytic domain substrate binding could be precious in many cases of exotic enzyme.
This seems to me to be more in line with the intentions of GO and I am not sure that losing these types of annotation is really a bad thing in most cases. However there are perhaps specific cases where this could be very confusing for users as described comments on point 1. On the other hand, if the only information available for a protein is that it binds zinc, or a lipid etc, then that is not much information at all and the significance of a paper that publishes only these kinds of results is perhaps questionable. If you have 2 papers and paper 1 shows cholesterol binding and paper 2 demonstrates a role for the protein as a cholesterol transport protein, I would annotate paper 2 and leave paper 1 aside. See also point 3 below.
On the condition that there were terms for non-substrates (ie cofactors, allosteric regulators, etc).
See above. Same concerns if this is all that has been experimentally determined
I don't think this will work. While we all know that we should read definitions before annotating if you leave the terms in without making this change explicit in the name curators will keep using them incorrectly. Also, what do you do if it isn't clear whether it is a substrate or not? If you don't annotate you lose the 'binding function' info - if you do annotate it may turn out later to be wrong. GO annotation is only one aspect of the data that is captured from a paper - curators simply don't have time to revise old data in the light of new info during routine curation. Once an annotation is made it is likely to stay there for a long time.
It is often not possible to make the distinction, e.g. because there is insufficient information. It would be paradoxical to annotate "GTP-binding" only when GTP has a regulatory rule. Plus, would be very difficult to implement.
Given your definition, it doesn't seem like the con "curators would not be able to annotate proteins with limited information" is a valid point, as it would only apply to proteins that contain catalytic domains (which wouldn't fall into the "all we know is that it binds protein" category).
I don't think i understand the con part!
I think that if curators annotate a catalytic substrate to binding that is OK, but I think the practice should be discouraged. I want to slow down the proliferation of binding terms for every possible substrate in the biological realm.
What if 'catalytic domain substrate' binding is what the assay demonstrates? We'd want to capture that.
For all the questions involving x binding redefinition, I am opposed to precomposed terms of the type "x binding"
Do not change the definition, change annotation practice.
When I read a paper and they show that it binds GTP, then I would like to be able to curate that fact, especially when I have no other molecular function information to give. The author is unlikely to make any unequivocal statement stating that it binds GTP but definitely doesn't hydrolyze it. And, if I have to go read other papers that do separate experiments which show that it is "NOT" hydrolyzed, can I really use the original reference as my sole source for that annotation? It seems that I have to draw an inference from multiple experiments to know that it does A (binds GTP), but definitely doesn't do B (use it as a substrate). I like to try to limit my annotations to what is directly shown by a particular experiment when possible.
Again, binding should only be annotated if it is directly demonstrated.
It would be an ENORMOUS amount of work to go back through annotations to decide whether the term was used 'correctly' or not
I am not sure removing all of the "incorrect" annotations is necessary. I view it as "redundant information", which should be eliminated because it doesn't add information.
Annotation of such binding should only be made when the experiment specifically shows such binding, when the point of the experiment is to prove such binding, not inferred by the annotator because of the presence or requirement of a cofactor or reagent in the reaction mix.
In many older biochemistry papers, binding was used as an indicator that the protein might have catalytic activity on the binding partner. Ex: one step in the purification of a helicase might be a DNA column, but other proteins that bind to the column might not have helicase activity. It would still be useful to capture the DNA binding activity of these proteins even if they are subsequently shown to have catalytic activity on DNA. Also, it's useful to have a sense of history in GO, i.e. what was known when, so that we can see how partial information (DNA binding) might develop into a clearer picture (topoisimerase).
I agree with all of the cons; I reject all of the pros.
I don't see why this applies: "In future curators would not be able to annotate proteins for which the only information presented is that they bind 'x' (with no indication of the context of this binding - is it a substrate/cofactor/something else?). " If they bind 'x' then they could be annotated as such. In the future if 'x' is discovered to be a substrate, the previous annotation would have to be removed.
I dont think it's possible to do annotation if one annotation (x binding) depends on another annotation (here, catalytic activity).
I don't think we should try and capture every possible bit of information or even everything that we think would give the user a "complete picture". At the same time, if the experiment shows that the gene product binds X, then the "X binding' annotation is appropriate.

Question 3

Change the definition of the' x binding' terms to explicitly exclude catalytic domain substrate binding AND make grouping terms for the activity molecular function terms to indicate the type of substrate being chemically changed (e.g. new GO term: 'catalytic activity; ATP hydrolysing')

Comments:

How is this different from ATPase activity?
I'm sorry that I'm a bit confused by this. Is this indicating that ATP is being used to drive another reaction. Am I supposed to annotate to this term only if they prove that ATP has been hydrolyzed? If so, it seems like I still would not be able to capture a lot of "incomplete" information when this has not been tested.
You say "in addition to the benefits associated with proposal 1..." - there were no benefits listed for proposal 1.
I agree with the cons
I agree with this suggestion, assuming it is possible
I am in general in favor of trying to keep track of ATP utilizing enzymes, but am unsure of the best way to do it, nor am I sure that GO is the best place for this.
The argument proposed by the "cons" seems to me to be applicable to the actual functions (such as enzymatic activities) themselves, irrespective of whether or not you specify the cofactor. When defining enzymatic activities you obviously have to specify (at least in general terms) what the substrate is; this in spite of the clear caveat that all possible substrates have not been exhaustively tested. The same logic could be applied to cofactors: an enzyme could employ either ATP, GTP, both, or indeed other NTPs. Therefore, the use of a hierarchical classification of enzymatic activities or molecular functions should naturally allow the annotator to apply his or her own judgement as to what the appropriate level of specificity should be for the definition of both substrate and cofactor.
I think this would make the GO terms expand too much... what if the assay was performed with ATP-gamma-S? Would there be a GO term for every substrate tested? This would get out of hand, would it not?
Not sure how this solves the problem mentioned in comments to question one. That is, one could not annotate to 'catalytic activity; ATP hydrolyzing' if the protein binds ATP but has not been demonstrated to hydrolyze it how will this help?
difficult to implement.
Annotate what is shown experimentally only. I agree with the 'con' statement.
if they're so desperate for this info, they should search the defs for 'ATP'.

Question 4

Annotate to 'x binding' terms only when a gene product is found either to bind 'x' and not alter it (e.g. as a cofactor) AND when the only information available for a gene product is that they bind 'x.

Comments:

Annotate based on the experiment, not what happens to the bound substance.
Do we have to invalidate the old curation once we do find that it hydrolyzes ATP? This was mentioned in the working group document. In a sense, if someone did a search for ATP-binding, what they would really be getting are the "ATP-binding, not further tested or curated + ATP-non-hydrolyzing" group and that doesn't seem ideal.
Annotate to 'x binding' terms only when a gene product is found to bind 'x'. Period. Inclusion of 'either' with 'AND' in the question is confusing. I think 'OR' is intended.
As before, annotation of such binding should only be made when the experiment specifically shows such binding, when the point of the experiment is to prove such binding, not inferred by the annotator because of the presence or requirement of a cofactor or reagent in the reaction mix.
This makes sense: define binding as not including known catalytic activity, and define catalytic activity as including binding. To clean up the old annotations, just search for papers that annotate the same protein to both terms.
If you are going down the road of remiving substrate binding, then I wonder if perhaps it might not be better to remove cofactors too. Cofactor binding can't be thought of as a molecular function, and in many cases the cofactor requirements are investigated with perhaps less rigour than the substrate specificity. If you did this then you would be reliant on an external classification that stores such information such as the EC classification.
Again, if we exclude substrate binding, GO terms need to be available for the cofactor or allosteric regulator, but it is not necessary (in my opinion) to further define the exact chemical. Info about the specific type of cofactor or regulator could be put into another column (ie. isn't column 16 for something like this?).
Seems like this would solve my concerns. If a GTPase has been shown to hydrolyze GTP then, don't need to annotate to GTP binding, however it would, if all that is known is that it binds GTP. We would just need to educate our users that GTP binding is implicit in annotation to GTPase activity. As long as the tree is accurate, then all gene products that bind GTP can be identified.
difficult to implement: often we do not have enough information (now), but a few months later, everything is different: this would be impossible to implement, and confusing for users.
I think that the only time you should annotate "x binding" terms is when the only function is "x binding" or when no other information is available. I don't think it is worthwhile to make an exception for non-catalytic binding.
Too restrictive.
This kind of inconsistent usage is just asking for trouble. Future annotators would have to go back and check whether this annotation was genuinely a binding annotation or if it was just a placeholder. Nightmare!
Keeping information on the proteins whose only feature known is that they bind 'x' is inappropriate as this will generate two kinds of terms (placeholder group and more detailed functional group) with a confusing frontier. It would then be difficult for the curator to decide when to update the term.

Question 5

Create two 'x' binding terms: those describing substrate binding interactions and those describing cofactor binding interactions.

Comments:

I am very naive and I apologize for that. Let's say a transporter binds and hydrolyzes ATP to pump a proton. Although, technically, ATP is certainly a substrate of that transporter, is that what most users would think of as a "substrate" of a transporter? In principle, I like this distinction, but again, I worry that most authors, when showing that protein X binds Y will not tell us whether Y is acting as cofactor or a substrate. In some cases, it might be easy to infer, but, not in all cases. And how would this affect IEA annotations? Do these state whether the bound molecule acts as a cofactor or a substrate?
I don't think this is necessary.
Presumably there would be a parent term, e.g. ATP binding, in order for this to work. This would be my second choice if something HAS to change.
Who needs to be told that GTPase binds GTP?
Just annotate to what the experiment shows.
I only agree with this solution if there is a common 'binding' parent e.g. ATP binding ---ATP binding, ATP as substrate ---ATP binding, ATP as cofactor for when you don't know what sort of binding is taking place. I don't see expansion of terms as a problem.
I think this is too complex. Agree with the "cons" that GO is probably not the best place for such information.
Quite like this idea but not the idea of retrofitting our annotations - ok if it can all migrate to a common parent for now.
the same molecule can sometimes be a co-substrate, and at other times a cofactor that is not changed during the reaction. Plus, there are all those enzymes where the mechanism is not known. This distinction would be difficult to implement
Again, what I would like to avoid is having a specific GO term for every molecule that a gene product binds as a substrate or cofactor. I think there should be one new term for regulated by a small molecule allosteric effector and then put the CHEBI code for the molecule in column 16.
Can the specific substrate/cofactor not be captured in column 16 with a more generic term (any of the direct children of 'binding') as the GO id?
What do we do when it is not clear if it is a substrate or a cofactor...
In fact, this could be the solution to separate the both binding interaction. Of course, it will be a huge task to retrofit the info.

Question 6

Create a relationship in the ontology such that if an annotation is made to a catalyst term then we also know that the gene product is annotated to a binding term for the substrates e.g. add a new relationship: 'GTP binding' involved in 'GTPase activity'.

Comments:

sorry- I'm confused.
I prefer not to make implied annotations.
Better to do it by definitions. See comments to #4 above.
This is complicated, and I am not sure I understand the argument of the cons here. ATP-binding would have a mixed bag of associations - presumably many activities would "point" to ATP-binding - but why is that a problem as it reflects biological reality?
This choice is made in the opinion that we go all the way and try to solve the 'binding' issue however best we can or else leave things the way they are.
The usage notes in AmiGO, GONUTS or OboEdit could take care of this... and make it clearer to any annotator that this is implicit.
Seems like this should be implicit in the catalysis term 'GTPase activity'. As such, as long as the tree is structured correctly, seems obvious that a GTPase has to bind GTP to hydrolyze it.
In the case of implied annotation, a new evidence code should be used to indicate the relationship/dependency (something similar to IC (inferred by curator) that could be called IR 'inferred by relationship'). In opposite, an explicit annotation would have a 'better' evidence code.
Yes, this sounds like a good idea, except: how to deal with big complexes that have catalytic, regulatory and structural components, e.g. F0/F1-ATPase. One should not add the term "ATP-binding" to all subunits, but only to those that really bind ATP. This is what is done in UniProt, with regards to the Keyword "ATP-binding".
What about GTP binding that is NOT involved in GTPase activity? This would create a true path violation.
What if something bound ATP in its active site AND somewhere else on the molecule? How would you know the difference?
I agree, but in this case I would add 'ATP binding' only to the subunit(s) which bind ATP. E.g. ATP4A_HUMAN (P20648) has the keyword 'ATP-binding' while ATP4B_HUMAN (P51164) doesn't. This is why I wouldn't add this kind of relationship in this case, but only for cases when it is 'always true'.
I do not think that implied annotations should be made unless annotators are willing to use the evidence code IC. Annotatotions should be based on what was actually shown and not what the author hoped to show or what annototators think the user expects to see.

Question 7

Do nothing. Allow curators to use the existing terms that describe 'x binding' and accept that the resulting annotations will not indicate to users whether the gene product binds the molecule as a substrate or cofactor.

Comments:

Working on extended thoughts will be at http://wiki.geneontology.org/index.php/User:JimHu/Binding_terms
Please be aware that I am answering this survey based on gut reactions with very few examples at hand and a very limited knowledge of substrate binding matters.
It is not always straightforward to decide if the molecule is a substrate or cofactor, the author may not give all this information - we would lose a lot of information if we did not annotate these at all. Has the user community been asked about this - I can't see that they would be too perturbed to see that an ATPase had been annotated to ATP binding.
If GTP binding is documented, it should be annotated. That might lead someone to test whether GTP is a substrate or cofactor. And anyone interested in "in vitro kinetics" or testing or assays would certainly like that information, even if it is only IEA.
Annotators should annotate the specific data in a paper and resist the temptation to annotate binding events which are not specifically shown in the paper.
not sure, I could be persuaded to accept this solution
Reading through this it seems like there's a danger the solution might be worse than the problem!
One of the arguments advanced is that GO is not the place for substrate/cofactor info. I kind of agree with that though I am still new to GO. Therefore it seems to me that the answer to the part of the question that states "..accept that the resulting annotations will not indicate to users whether the gene product binds the molecule as a substrate or cofactor.." should be "yes" whatever the outcome. GO should not try to indicate whether a molecule is a substrate or a cofactor.
Though I chose other options above, I am fine with GO being the way it is, but in support of clearly stating the status quo in the documentation. Sometimes I think that if we just represented what the authors show in the paper used in any single annotation, we would be fine. Personally I find that some of the problems arise from curators being ambitious and wanting to provide 'complete information' though this is done with good intentions.
Clearly, a decision must be made. This has been a stumbling block many times and I think a single choice must be made!
I'm torn between yes, but modify GO doc and Jim's suggestion. If we 'do nothing' we should at least tidy up the DAG and fill-in/remove some terms. Similar to the random selection of protein complex terms there are a fairly random selection of terms based on what folk could be bothered to ask for. NTRs however efficiently handled so things done. Curators much prefer to get by with what is there than stop to make a new request - this must biase curation significantly. Again we must remember, GO is not the only info captured by many MOD curators.
The most important point: try and imagine the expectations of GO users and their needs. Try and imagine how they use GO annotation, and with what goals. The "Binding" terms could certainly be improved, by thinking about how they are used, and by adding additional hierarchies. It is important to make GO annotation as user-friendly as possible. It is equally important to consider the practical difficulties associated with annotation, e.g. rapid changes in the amount of available information that would change the status of a GTP-binding protein to GTPase.
I think the issue can be resolved a) by modifying the documentation and help tips on Amigo to make it clear that binding of a substrate is include in the definition for enzymes and transporters, b) create a GO term for allosteric effector to be used for all effectors and put the CHEBI ID or SwissProt ID in column 16, c) have a policy for whoever is in charge of GO terms that will limit the creation of new binding terms requested for substrate binding, and d) remove GO binding terms that have no genes annotated to them
Modifying the GO documentation for clarity is always good.
Inertia is very dangerous, especially when it gives rise to incorrect, misleading or ambiguous data. Let's sort these problems out now, while we still have a chance!
If consensus can't be reached on term changes, at least change the documentation to allow currently used binding terms for all purposes.

Binding Terms Survey Comments

Contents

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Navigation menu