Elements of an annotation: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 53: Line 53:
=No single established database?=
=No single established database?=
Some model species research communities do not have an established database group with funding and time to commit to long-term maintenance of their datasets. Such groups can contribute annotations to the central repository via the UniProtKB GO Annotation (UniProtKB-GOA) multispecies annotation group. This is also a possible route for those groups just starting out in annotation who may wish to take up the responsibility for long-term maintenance of their datasets at a later date.
Some model species research communities do not have an established database group with funding and time to commit to long-term maintenance of their datasets. Such groups can contribute annotations to the central repository via the UniProtKB GO Annotation (UniProtKB-GOA) multispecies annotation group. This is also a possible route for those groups just starting out in annotation who may wish to take up the responsibility for long-term maintenance of their datasets at a later date.
=Annotating gene products that interact with other organisms=
The majority of gene products act within the organism that encoded them. However, sometimes gene products encoded by one organism can act on or in other organisms. For example, in obligate parasitic species (including viruses), almost all their gene products will be interacting with their host organism. Interactions may also be between organisms of the same species: for example, the proteins used by bacteria to adhere to one another to form a biofilm. For annotating gene products involved in these multi-organism interactions, there are special terms in the biological process ontology, under multi-organism process ; GO:0051704, and in the cellular component ontology, under other organism ; GO:0044215. More specific information can be found in the biological process documentation on multi-organism processes and in the cellular component guidelines on host cell. The species in the interaction are recorded in an annotation by using terms from this node and entering two taxon IDs in the Taxon column. The first taxon ID should be that of the species encoding the gene product, and the second should be the taxon of the other species in the interaction. Where the interaction is between organisms of the same species, both taxon IDs should be the same. The taxon column of the annotation file is described in more detail in the annotation file format guide. An additional taxon ID should not be added in cases where the annotation is based on sequence or structural similarity.
== Nomenclature Conventions==
* The terms 'symbiont' and 'host' may carry connotations of the nature of the interaction between two organisms, but in the Gene Ontology, they are used solely to differentiate between organisms on the basis of their size. The word symbiont is used to refer to the smaller organism in a symbiotic interaction; the larger organism is called the host. If the two organisms are the same size, the term will be contain other organism. Note that parasites and pathogens are also referred to as 'symbionts', as symbiosis encompasses parasitism, commensalism and mutualism.
==Requesting new terms in the multi-organism process node==
* Like the rest of GO, the multi-organism process node is not complete, and you will probably have to request some new terms when annotating your gene products. These should be submitted via the GO curator requests tracker in the usual way. Here are a few points to bear in mind when requesting new terms, and annotating using this node:
* A term name should make the direction of the interaction clear. An example of this is given below; induction of nodule morphogenesis in host would be used to annotate the symbiont gene product, while induction of nodule morphogenesis by symbiont is used to annotate the host genes. Both processes would be children of a common term nodulation.
* If your gene product affects a 'normal' host process, you should always request a new term in the MOP node, rather than just annotating directly to the term in the 'normal' ontology. So for example, if your bacterial gene product regulates the ethylene-mediated signaling pathway in plants, rather than using dual taxon to annotate to regulation of ethylene mediated signaling pathway ; GO:0010104, you should instead request a new term regulation of ethylene mediated signaling pathway in host.
* Where an organism subverts a 'normal' biological process, e.g. the transcription of viral DNA by host transcription machinery, host proteins should not be annotated to a 'symbiont' term like transcription of symbiont DNA. This is because this would be considered considered a pathological process, i.e. not 'normal' for the host.
* Example: Performing a process with another organism
** Nod factor export proteins transfer nod factors out of the purple bacterium Sinorhizobium meliloti into the surrounding soil. Here they are detected by LysM nod factor receptor kinases in Medicago truncatula roots and initiate the process of nodulation. Annotation of Nod factor export ATP-binding protein I from S. meliloti suggest a new term induction of nodule morphogenesis in host
    nodulation ; GO:0009877 [p] induction of nodule morphogenesis in host ; GO:00new01
    Sinorhizobium meliloti taxonomy ID: 382 Medicago truncatula taxonomy ID: 3880
    protein name: Nod factor export ATP-binding protein I GO term: induction of nodule morphogenesis in host ; GO:00new01 taxon column: taxon:382|taxon:3880
Annotation of LysM receptor kinase LYK3 precursor from M. truncatula suggest a new term induction of nodule morphogenesis by symbiont
    nodulation ; GO:0009877 [p] induction of nodule morphogenesis by symbiont ; GO:00new02
    Medicago truncatula taxonomy ID: 3880 Sinorhizobium meliloti taxonomy ID: 382
    protein name: LysM receptor kinase LYK3 precursor GO term: induction of nodule morphogenesis by symbiont ; GO:00new02 taxon column: taxon:3880|taxon:382
* Example: Performing a process in more than one species
** The protein cardiotoxin from the southern Indonesian spitting cobra Naja sputatrix kills mammalian cells by cytolysis when it enters the host cell cytoplasm. Annotation of cardiotoxin precursor, from N. sputatrix use the GO terms cytolysis of cells of another organism ; GO:0051715 and host cell cytoplasm ; GO:0030430 Naja sputatrix taxonomy ID: 33626 Mammalia taxonomy ID: 40674
            protein name: cardiotoxin precursor GO term: cytolysis of cells of another organism ; GO:0051715 taxon column: taxon:33626|taxon:40674 protein name: cardiotoxin precursor GO term: host cell cytoplasm ; GO:0030430 taxon column: taxon:33626|taxon:40674
* Example: Regulating a process in another organism
Mosquito saliva contains D7 proteins, which bind biogenic amines in order to suppress hemostasis in humans. Annotation of D7 protein long form, from A. gambiae suggest a new term negative regulation of hemostasis in host
    evasion of host defense response ; GO:0030682 [i] negative regulation of hemeostasis in host ; GO:00new03
    Anopheles gambiae taxonomy ID: 7165 Homo sapiens taxonomy ID: 9606
    protein name: D7 protein long form GO term: negative regulation of hemeostasis in host ; GO:00new03 taxon column: taxon:7165|taxon:9606





Revision as of 12:41, 7 March 2019

  From http://geneontology.org/page/go-annotation-conventions
  TO BE REVIEWED


Elements of an annotation

Annotation Subject

  • Annotations subjects consists of valid database identifiers, such as WB:WBGene00003721 or SGD:S000001048.
  • The list of valid database prefixes can be found on the GO website.

Relations

Negation

  • NOT is used to make an annotation statement that the gene product is not associated with the GO term.
  • When combined with an explicit annotation relation, e.g. enables, the NOT qualifier indicates that the gene product does not have that relationship to the GO term.
  • NOT may be used with terms from any of the three ontologies.

In practice, the NOT qualifier is used in two ways:

  1. When a GO term might otherwise be expected to apply to a gene product, but an experiment, sequence analysis, etc. demonstrates otherwise.
  2. When there is conflicting experimental findings in the literature and curators would like to accurately capture all relevant data.

Use of the NOT qualifier is particularly important in cases where associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method). For example, if a protein has sequence similarity to an enzyme (whose activity is represented as Molecular Function GO:nnnnnnn), but has been shown experimentally not to have the enzymatic activity, it can be annotated as NOT GO:nnnnnnn.

In phylogenetic-based annotation, i.e. PAINT, the NOT qualifier is used in conjunction with the IKR (Inferred from Key Residue) evidence code. Here, NOT is used to annotate a gene product when, although homologous to a particular protein family, it has lost essential residues and is very unlikely to be able to carry out an associated function, participate in the expected associated process, or be found in a certain location.

The NOT qualifier is not used to annotate negative or inconclusive experimental results.

GO term

A gene product can be annotated to zero or more terms from each ontology.

Evidence

Reference

Assigned_by

Every annotation is marked with the name of the group that made the annotation. The group that made the annotation may be different from the database who manages the identifiers and/or the annotation file.

Avoiding redundancy

Where two or more databases are submitting data on the same species we encourage the model whereby one database group collects all annotation data for that species, removes the redundant (duplicate) annotations, and then submits the total dataset to the central repository. This ensures that no redundant annotations will appear in the master dataset. Please see the list of species and relevant database groups for more details. We understand that annotating groups will also wish to make their full dataset available to the public. For this purpose, the GO Consortium makes all of the individual datasets available from the GO website, via the GO web CVS interface, or from the directory go/gene-associations/ in the GO CVS repository. All of the individual datasets are also listed in the annotation downloads table, and all individual groups will clearly be given credit for the work that they have done. The non-redundant set is only used as the master copy that appears in AmiGO and similar tools.


No single established database?

Some model species research communities do not have an established database group with funding and time to commit to long-term maintenance of their datasets. Such groups can contribute annotations to the central repository via the UniProtKB GO Annotation (UniProtKB-GOA) multispecies annotation group. This is also a possible route for those groups just starting out in annotation who may wish to take up the responsibility for long-term maintenance of their datasets at a later date.


Old wiki pages to review