Multiple term annotations (Archived)

From GO Wiki
Revision as of 10:48, 28 March 2019 by Pascale (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This is a proposal for a quick fix to the current problem of being unable to create annotations that combine more than one term.


The Problem

In many cases, one gets several linked pieces of information about a gene product--e.g. it is involved in transcription in the nucleus; it performs catalytic activities X and Y in process Z--but this information gets lost in annotation as these combinations of terms get split up. The information could potentially be reconstructed by looking at the annotation metadata (the reference and the date of annotation), but there is no way to be sure that the annotations actually belong together without going back to the original source.


The Proposal

Give each annotation a unique identifier, and then have annotators record the IDs of annotations that belong together. These can be captured and submitted in a file to the GO consortium. Annotating groups would need to decide how to integrate this into their annotation interface.


Examples

This is an example taken from the GO annotation conventions documentation.

The protein cardiotoxin from the southern Indonesian spitting cobra Naja sputatrix kills mammalian cells by cytolysis when it enters the host cell cytoplasm.

Annotation of cardiotoxin precursor, from N. sputatrix, using the GO terms cytolysis of cells of another organism ; GO:0051715 and host cell cytoplasm ; GO:0030430

Annotation ID: A123456
GP: cardiotoxin precursor
GO term: cytolysis of cells of another organism ; GO:0051715
Annotation ID: A234567
GP: cardiotoxin precursor
GO term: host cell cytoplasm ; GO:0030430

In another file, consisting of the IDs of annotations that should be considered together:

A123456  A234567


It would be possible to create multiple term annotations with several terms, not just two. For example:

cytochrome c performs an oxidoreductase reaction (GO:0039185) in the process of oxidative phosphorylation (GO:0028537) in the mitochondrial matrix (GO:0082719)

Annotation ID: A00001
GP: cytochrome c
GO term: oxidoreductase activity ; GO:0039185
Annotation ID: A00002
GP: cytochrome c
GO term: oxidative phosphorylation ; GO:0028537
Annotation ID: A00003
GP: cytochrome c
GO term: mitochondrial matrix ; GO:0082719

Multiple term annotations file:

A00001  A00002  A00003


It would also be possible to create multiple term annotations using terms from the same ontology, e.g. to capture the activity of a multifunctional enzyme that catalyses several reactions in the same pathway.


Other benefits of this proposal

Giving annotations an ID would have other benefits:

  • easier to track down and remove incorrect annotations
  • sets of annotations could be identified by IDs (rather than the full annotation) so could be easily transferred between within or between tools
  • lots of other benefits as yet unimagined


Notes

This is acknowledged to be a low-tech, quick and dirty fix for the problem of capturing complex annotation data. More nuanced ways of capturing complex annotation data can be developed using this as a base. The main aim is of the proposal is to capture the information now so that the papers don't have to be reannotated in the future.