Old version of working group wiki

From GO Wiki
Revision as of 12:19, 26 June 2009 by Ruth.lovering (talk | contribs) (New page: Back to [http://wiki.geneontology.org/index.php/Binding_terms_working_group binding terms working group discussion] == Working group == * Ben Hitz * David Hill * Debby Siegele * Emily D...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Back to binding terms working group discussion


Working group

  • Ben Hitz
  • David Hill
  • Debby Siegele
  • Emily Dimmer
  • Jim Hu
  • Mike Cherry
  • Peter D'Eustachio
  • Ruth Lovering

Summary

From the GO function guidelines (listed below) catalytic and transporter activities should not be related to binding terms.


The proposal is to remove substrate binding terms from GO whenever possible and ensure statements within the GO term directing GO curators to annotate to binding terms are removed where appropriate.

Examples:

1. Obsolete GO:0043287: poly(3-hydroxyalkanoate) binding

2. The term: GO:0016887 ATPase activity should NOT include the comment: Consider also annotating to the molecular function term 'ATP binding, GO:0005524'.

Note enzymes MUST bind ALL of the substrates involved in a catalyzed reaction - there is no action at a distance.

Proposed question to circulate to the GOC and GO annotators on Monday 18th May

Should GO remove all annotations which describe substrate binding associated with catalytic activities or transport proteins?

eg:

  • a protein with GTPase activity will NOT be annotated to GTP binding
  • a protein with lipid transporter activity will NOT be annotated to cholesterol binding
  • the protein 'Nerve growth factor receptor associated protein 1' will NOT be annotated to 'death receptor binding'
  • a protein with helicase activity will NOT be annotated to 'DNA binding'

Responses from working group:

  • Ben Hitz - answer to question: Yes. More at Ben's comments
  • David Hill
  • Debby Siegele - Yes. Emily, I don't see how removing substrate binding terms would affect users, since they could still search for the enzyme activity. Are there cases where a user would want to search for gene products that bind a particular substrate but have different molecular functions?
  • Emily Dimmer - Although I'm sympathetic to the proposal, I have concerns with the loss of information this change might cause. We have a large number of annotations associated with these terms, and therefore the possibility that our actions might have a negative impact on users who work with this information. Any measure to improve the accuracy of the ontology/annotation set without great loss of existng annotation/data would be preferred.
  • Jim Hu - Yes to circulate question. Answer to question, No as written. However, I do not want the status quo. My original interest was in streamlining the ontology by using post-composition, not in removing the annotations. Much more at User:JimHu/Binding terms Jim's comments
  • Mike Cherry
  • Peter D'Eustachio yes circulate question, answer to question: Yes
  • Ruth Lovering yes circulate question, answer to question: No
  • Pascale Gaudet: the question makes the assumption that the annotator knows that a protein is a lipid transporter (for example), before deciding whether to make the annotation to cholesterol binding. In many cases, all you know is something binds cholesterol, and by sequence analysis has a transmembrane domain. The lipid transporter annotation would be an IC without an annotation to refer to. The second problem here is the common situation where different bits of information come from different papers: in a first paper a gene product is shown to bind cholesterol, and in the next paper to actually transport it. Do we then remove the first annotation? It also becomes inconsistent as to what _Is_ annotated to 'binding'.

Pros

What these "binding" proposals all have in common is that they essentially want to track, for all enzymes the strict biochemical mechanism and all cofactors for each reaction, as well as all "relevant" substrate-product combinations. That is better left to some other database.

GO does not track biochemical reactions. It doesn’t track the reactants nor the products.

Should GO track protein kinase substrates? Glycolization sites? Ubiqutinylation substrates? GO needs to be consistent, why should GO partially track some reactants some of the time. That's not going to help anyone in the long run. In 99% of all cases, it will be better to cross index a database that is actually DESIGNED to store this sort of data.

This proposal suggests that we should remove from GO terms such as GO:0043287: poly(3-hydroxyalkanoate) binding and replace it with nothing - because this description of substrate binding is not the role of GO, and delete the majority of ATP binding annotations, ATP binding to only be associated with proteins which bind ATP as a co-factor.

Cons

Currently there are substantial numbers of binding terms associated with protein records by electronic means, for instance: 1,539,419 electronic annotations to just the ATP binding terms (versus 880 manual annotations), which include both 'substrate binding' and 'cofactor binding'. Furthermore, many of these terms are associated in a 'systematic manner', through for example protein domains, eg InterPro includes a number of domains which define a nucleotide binding site, for instance; IPR011761 ATP-grasp fold.

Could we store the information on substrate ATP binding annotations in another way, or alternatively still help users capture this information? Might it be useful to have a high-level grouping term so that proteins can be identified as to the energy source they use to carry out a catalysis (e.g. 'catalysis; ATP-hydrolysing'. Could such ribonucleotide terms be considered a bit differently from other binding terms, as one could say that the main purpose of an ATP-dependent enzyme, for instance a peptidase, is not to break down ATP, but to break peptide bonds.

It is going to involve a vast amount of work for the annotation groups to split up the nucleotide binding annotations into 'substrate binding' or 'cofactor binding' types.

In addition a large amount of information will be deleted from GO.

In order to preserve some of this information, but in a more appropriate ‘GO’ format could the GO terms provide an indication in the definition (or term's parentage) the specific ribonucleotide being used, e.g. making the term more specific 'GTP-dependent helicase activity', 'protein kinase activity' or expanding the ontology: >‘ribonucleotide-dependent catalytic activity’ >> ‘ATP-dependent catalytic activity’ This would follow previous terms such as: GO:0016723 'oxidoreductase activity, oxidizing metal ions, NAD or NADP as acceptor', as well as many specific terms e.g. 'GTP-dependent polynucleotide kinase activity', 'thymidylate synthase (FAD) activity', 'DNA ligase (ATP) activity', 'N-methylhydantoinase (ATP-hydrolyzing) activity').

This would at least mean that if any users had become accustomed to using the ATP binding annotation set to find those gene products that metabolised ATP, they could in future still gather together relevant gene products by using such a grouping term (and as a side benefit it could be helpful for curation consistency, if we could search for proteins co-annotated to 'ATP binding' and 'catalytic activity (ATP-hydrolysing)' then we would have reason to investigate further the validity of the 'ATP binding' annotation).

The reason I have suggested limiting the removal of substrate binding terms to all molecules except protein is that I am not comfortable with the idea that the only proteins annotated to 'receptor binding' will be their ligands, because the signal transduced after ligand binding is a series of catalytic reactions and these substrates will not be included. If the intention is that these substrates will be included in column 16 then I would be happy to accept this. However, in previous emails Ben has stressed that long substrate lists in column 16 is not his idea of column 16, (although my impression is that many annotators do want to use column 16 for this purpose).

As an annotator I would like to be able to add the GO term 'DNA binding' to a protein whose only known function is that it binds DNA. However, unless I can show that this binding is not associated with a catalytic activity I will not be able to include this annotation because to do so would produce misleading annotations, for example potentially a novel helicase would be annotated as binding DNA, whereas no other helicases would be annotated to DNA binding. Consequently the removal of catalytic substrate binding terms will limit the amount of data that can be captured by GO. Not that I think we will run out of data to use for annotation, but for proteins which have almost no data this may make a big impact. For example if a protein is shown to bind DNA and has 60% homology to a known helicase you may be more tempted to use ISS to transfer helicase activity to the protein, but you may not wish to do this if there is no evidence of DNA binding activity.

What would you like to see included or not included in the proposal for it to be acceptable?

  • Ben Hitz comments
  • David Hill
  • Debby Siegele
  • Emily Dimmer
  • Jim Hu comments
  • Mike Cherry
  • Peter D'Eustachio
  • Ruth Lovering - I would be happy with the proposal if proteins were not considered as substrates so that protein-protein interactions could continue to be annotated in GO (see comments above in Cons section).
  • Midori Harris (not really a working group member; speaking from an ontology development perspective) - This decision is one that annotators have to make. Once the annotators reach a consensus, the ontology editors will make any necessary changes to the function ontology. If there is a demand to make binding terms obsolete, we'll probably need annotators' input to determine which terms to retain.

Other aspects of the discussion to consider

Use of experimentally verified binding to support catalytic activity annotation

Ben to write comments in here.

Emily suggested: The paper, PMID: 10980193, partially characterizes a GTPase. It states that the protein is thought to be a GTPase, however this paper only measured its capacity to bind GTP, but not its GTPase activity directly. If this paper provided the only evidence for a possible GTPase activity, I would have considered annotating to the 'GTP binding' term, using the IDA evidence code, and referred to this annotation as providing supporting evidence for GTPase activity ('IC'). However, if we can no long annotate to GTP binding to proteins which use it as substrates, and assuming that this is the only evidence the curator can find to support an annotation to GTPase activity, how do curators feel about the alternative annotation to:

GTPase activity (GO:0003924) evidence='IPI' PMID:10980193 with='CHEBI:15422'

Peter: not everything that binds GTP has GTPase activity so it would be wrong to create this annotation.

From the proposal it will not be appropriate to annotate to binding activity, such as 'GTP binding', if this is all that is known about the function of the protein and there is no known catalytic activity predicted from this binding.

Debby: The paper Emily referred to characterizes ARL4 which is predicted to be a GTPase based on amino acid sequence similarity to other known GTPases. Based on this paper, I would annotate ARL4 to GTPase activity (GO:0003924) evidence='ISS' PMID:10980193. This seems like a case where it would be appropriate to annotate to a GTP binding term evidence="IDA" PMID:10980193.

Examples of GTP/GDP/ATP/ADP binding where this binding does not lead to GTP/ATP hydrolysis

Midori: for ATP specifically, are there any activities that use it as a substrate, but don't hydrolyze one of the phosphodiester bonds (i.e. either ATP -> ADP + Pi or ATP -> AMP + PPi)?

Jim: In the SOS response, activated RecA (i.e. RecA in the ATP-bound state) has several activities that are not dependent on ATP hydrolysis, including stimulating autocleavage of lambda repressor, LexA, and UmuD. I believe that there are other examples where NTP binding is needed for an activity and hydrolysis is to flip the protein to the inactive state. So, even though G-proteins are GTPases, their signaling activities are not coupled to GTP hydrolysis. Hydrolysis returns them to their OFF state.

Peter: GO already provides for these. See, for example GO:0030695 GTPase regulator activity and its children for the small GTPase case and attempting to capture exactly the guanine nucleotide-regulated switching behavior of eukaryotic small GTPases like RAS, RAN, RHO, etc.

Peter: Not all the functions of RAS require hydrolysis of GTP. While the entire life-cycle of the normal protein may well depend on binding AND hydrolysis, this life cycle comprises multiple molecular functions that are involved in multiple biological processes and any one function / process can involve only binding (or only hydrolysis or, probably, only exchange of bound for free nucleotide).

Cross product annotations

Jim: should substrate binding be used in cross product annotations (see Annotation_Cross_Products#binding_example)

Data interpretation and other databases

Peter: GO can't respond well to queries asking for "all ATP-binding proteins" or "all alcohol-binding proteins" because it's a naïve question, not a logical flaw or incompleteness in GO.

This kind of query is a natural fit to a data aggregation tool, and not something that any one database (and in particular in this case not GO) can or should answer alone. GO's role here is to provide words and logical structures to allow reliable linking among the various sources of the actual data.

Mike: Where will the line be drawn for what to include? If we say when there is proof that a GP is a kinase it should also have ATP binding as a function, how far do you go? Will all reactants and products for all catalysts be added? I see much of this is actually inferred details. If kinase then binds phosphate, but there may not have been a direct assay of the binding. I know someone will say oh we infer things all the time. Yes, but should we extend this practice. It seems there is more than effort to do than add this type of inferred information.

Is there a single (or multiple) database which provides a source of this interaction information? If so, is it being used as a dataset for microarray analysis tools? If not why not? If GO removes binding interaction data what impact will it make on the interpretation of experimental datasets?

Other issues brought up in this discussion

This discussion also led to broader comments about the role of GO: Kinase substrate information should be stored in protein kinase substrate databases. Receptor binding should just be thrown out altogether as a term, except for actual LIGANDS. G-protein / GPCR interactions should be stored in a signalling pathway/protein-protein interaction databases.

From the Documentation for the Function Ontology

Binding guidelines

Avoid Binding Relationships

Catalytic activities should not be related to binding terms (see the September 2003 Bar Harbor GO meeting minutes); for example, ATPase activity should not be related to ATP binding. Similarly, there should not be a relationship between transporter terms and binding terms. Binding terms should only be used in cases where a stable binding interaction occurs. There are several reasons for this.

Firstly, transporter, catalysis and binding activities are all in the function ontology, which is used to describe elemental single step activities that occur at the macromolecular level. That means that if we were to further subdivide these functions - for example, splitting the catalysis of a reaction into steps such as "substrate binding", "formation of unstable intermediate" or "attraction of electrons to positive charge" - we would be saying that a reaction was actually a series of functions - i.e. a process. Additionally, we would be going beyond the scope of the molecular function ontology as we would be dealing with events on a molecular or atomic level.

Another reason is the sheer practicality of sorting through the 4000+ catalytic reactions we have in GO and deciding which of the substrates and products should be given 'binding' terms. Should we say that only substrates are bound by an enzyme? How about reversible reactions or cases where the reaction mechanism is unknown?

Finally, the GO binding terms are supposed to represent stable binding interactions, as opposed to the transient binding that occurs prior to catalysis. Hence there should not be a connection between stable binding and catalysis.

From the minutes of Bar Harbor GO Consortium Meeting 2003

BarHarbor minutes

Section 5) Ontology Development Issues

d) Consistency of Parentage (catalysis and binding) It was agreed that enzyme activities should have only the catalysis parent All binding parents to enzyme activities should be removed where appropriate.


Ontology Development Action Items 17. Document the fact that binding is not always a parent of enzyme. Binding is only a parent when stable binding occurs. Remove Binding as parent where appropriate.


Conference call

Binding Terms Conference Call Information

Binding Terms minutes June 09