Multiple term annotations (Archived)
This is a proposal for a quick fix to the current problem of being unable to create annotations that combine more than one term.
The Problem
In many cases, one gets several linked pieces of information about a gene product--e.g. it is involved in transcription in the nucleus; it performs catalytic activities X and Y in process Z--but this information gets lost in annotation as these combinations of terms get split up. The information could potentially be reconstructed by looking at the annotation metadata (the reference and the date of annotation), but there is no way to be sure that the annotations actually belong together without going back to the original source.
The Proposal
Give each annotation a unique identifier, and then have annotators record the IDs of annotations that belong together. These can be captured and submitted in a file to the GO consortium. Annotating groups would need to decide how to integrate this into their annotation interface.
Examples
This is an example taken from the GO annotation conventions documentation.
The protein cardiotoxin from the southern Indonesian spitting cobra Naja sputatrix kills mammalian cells by cytolysis when it enters the host cell cytoplasm.
Annotation of cardiotoxin precursor, from N. sputatrix, using the GO terms cytolysis of cells of another organism ; GO:0051715 and host cell cytoplasm ; GO:0030430
Annotation ID: A123456 GP: cardiotoxin precursor GO term: cytolysis of cells of another organism ; GO:0051715
Annotation ID: A234567 GP: cardiotoxin precursor GO term: host cell cytoplasm ; GO:0030430
In another file, consisting of the IDs of annotations that should be considered together:
A123456 A234567
It would be possible to create multiple term annotations with several terms, not just two. For example:
cytochrome c performs an oxidoreductase reaction (GO:0039185) in the process of oxidative phosphorylation (GO:0028537) in the mitochondrial matrix (GO:0082719)
Annotation ID: A00001 GP: cytochrome c GO term: oxidoreductase activity ; GO:0039185
Annotation ID: A00002 GP: cytochrome c GO term: oxidative phosphorylation ; GO:0028537
Annotation ID: A00003 GP: cytochrome c GO term: mitochondrial matrix ; GO:0082719
Multiple term annotations file:
A00001 A00002 A00003
It would also be possible to create multiple term annotations using terms from the same ontology, e.g. to capture the activity of a multifunctional enzyme that catalyses several reactions in the same pathway.
Other benefits of this proposal
Giving annotations an ID would have other benefits:
- easier to track down and remove incorrect annotations
- sets of annotations could be identified by IDs (rather than the full annotation) so could be easily transferred between within or between tools
- lots of other benefits as yet unimagined
Notes
This is acknowledged to be a low-tech, quick and dirty fix for the problem of capturing complex annotation data. More nuanced ways of capturing complex annotation data can be developed using this as a base. The main aim is of the proposal is to capture the information now so that the papers don't have to be reannotated in the future.