User:Ben/Binding terms

From GO Wiki
Jump to navigation Jump to search

Back to conference call information

I think there is a fundamental difference in how some (annotation) groups view GO. I (and curators at SGD share this opinion) feel that one should not use the GO to attempt to annotate any and all information about gene products that appear in any given literature reference. It is simply a ridiculous task for the GOC to attempt this, and I feel that this is very clearly illustrated in the "divide" regarding Binding terms.

I feel that while every little datum is certainly important, and deserves to be captured _somewhere_, adding a GO term annotation for it is often inappropriate, and generally harmful to the homogeneity of GO annotations. The GOC has always, for better or worse, allowed individual MODs and curation groups to decide what and how to curate, and while this flexibility is often warranted, based on the scope and depth of the literature (for a given organism or group of organisms) it also leads to large discrepancies in annotation practice and makes comparing cross-organism GO information "fun", to say the least.

That being said - I feel the following classes of information should NOT NEVER NO HOW be captured in GO. When they need to be captured, they should be incorporated into existing or even new database entities and cross-referenced via unique protein or gene product identifiers (e.g, UniProt ID).

  • Stoichometric information - how many monomers are in a biologically functional unit, including all related "self binding" and "homodimerization" terms
  • Protein-Protein interactions - i.e, "protein binding" with - these should be submitted or indexed in BioGrid and/or IntAct.
  • physical constants
  • sequence (this is of course stored and cross-referenced in godb and amigo)
  • evolutionary data (ditto, re: ref-genome data)
  • lists of substrates and cofactors, including allosteric interactions aka "nontransformative binding" - i.e, "X binding"

You should note that this list pretty much rules out MOST "binding" terms used for annotations. There are a few exceptions, which I call "terminal" binding that are acceptable. These are types of molecular functions where, to all available knowledge, the *purpose* of the gene product is to sequester, hold, or sense some other small molecule. Canonical example is Calmodulin, which I will grant is reasonable to annotate to MF: Calcium binding.

The last grey area remaining are situations where an experiment demonstrates binding to "X" and possibly infers some catalytic activity thereof, but does not demonstrate it. I agree that "partial" information is tricky to deal with. If we can denote it in a sensible way, i.e, "this gene product has the property 'ATP binding' (via IDA) but the purpose of that binding is currently Unknown" I would be in favor of using binding terms in this way.

I don't really feel strongly about "removing" the terms themselves, we have many terms "Cell Part" which are not to be used for annotations. However, if curators cannot resist using a term "because it's there" then maybe we should remove them. I do feel that the "stoichometric" terms should all be deleted.