Elements of an annotation: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(35 intermediate revisions by 3 users not shown)
Line 1: Line 1:
  From http://geneontology.org/page/go-annotation-conventions
This page describes the different annotation fields.
  TO BE REVIEWED
 
 


=Elements of an annotation=
=Elements of an annotation=
==Annotation Subject==
==Annotation Subject==
* Annotations subjects consists of valid database identifiers, such as WB:WBGene00003721 or SGD:S000001048.  
* Annotations subjects consists of valid database identifiers, such as WB:WBGene00003721, SGD:S000001048, or UniProtKB:P99999.
* Annotations subjects may be genes or gene products (e.g. proteins, including specific isoforms; ncRNAs; and protein complexes)
* The list of valid database prefixes can be found on the [http://amigo.geneontology.org/xrefs GO website].
* The list of valid database prefixes can be found on the [http://amigo.geneontology.org/xrefs GO website].


==Relations==
==Relations==
* Annotation Subjects and GO terms are linked by a '''Relation''' from the [https://github.com/oborel/obo-relations Relations Ontology] (note that this information is unavailable [https://geneontology.github.io/docs/go-annotation-file-gaf-format-21/ GAF files], but explicit in [https://geneontology.github.io/docs/gene-product-association-data-gpad-format/ GPAD files]). Guidelines for usage of Relations can be found on the [[Annotation_Relations]] page.
* Annotation Subjects and GO terms are linked by a '''Relation''' from the [https://github.com/oborel/obo-relations Relations Ontology].
* The specific relations applicable to each aspect of GO are listed in the [[Annotation_Relations]] page.
* The relations applicable to each aspect of GO as well as usage guidelines can be found in the [[Annotation_Relations]] page.


==Negation==
==Negation==


*NOT is used to make an annotation statement that the gene product is not associated with the GO term.
* The NOT statement indicates that the gene product does not enable a Molecular Function, is not part of a Biological Process or is not located in a specific Cellular Component.
*When combined with an explicit annotation relation, e.g. enables, the NOT qualifier indicates that the gene product does not have that relationship to the GO term.
* NOT statements are only used when a user might expect that the gene product would have a specific biological property (MF, BP or CC).  
*NOT may be used with terms from any of the three ontologies.
* Contrary to positive annotations, NOT statements propagate ''down'' the ontology, such that the annotation <code>gene product NOT enables protein kinase activity</code> means that the gene product does not enable protein serine/threonine kinase activity or protein tyrosine kinase activity either.
* Both positive and NOT statements can be used in cases where there is conflicting experimental findings in the literature.
* If an isoform has a different function from the main isoform represented by the gene-centric entity, a NOT annotation can be captured together with the isoform identifier.
* NOT annotations can be supported by experiments that show the lack of activity (or BP, CC), or based on sequence data that it has lost essential residues and is unlikely to be able to carry out a function, participate in a process, or be found in a certain location. In this case the evidence code is [http://wiki.geneontology.org/index.php/Inferred_from_Key_Residues_(IKR) IKR (Inferred from Key Residue)] evidence code.


In practice, the NOT qualifier is used in two ways:
'''When NOT to use negation'''
* The NOT qualifier should be not used to capture every experimental result.
** For example in a subcellular localization experiment, locations where the gene product is not found should not be captured, unless it is explicitly needed. If a protein is found in the nucleus but not in the mitochondrion, usually a <code>NOT located in mitochondrion</code> annotation would be inappropriate,
* The NOT qualifier should be not used to annotate negative or inconclusive experimental results.
** For example, if a mutant develops a specific anatomical structure, it doesn't imply that it does not contribute to the process; the experiment may just not allow to make the positive conclusion.
* The NOT qualifier should be not used to describe experimental conditions or specific contexts in which the gene product is not active (i.e, should not be used in combination with an extension).


#When a GO term might otherwise be expected to apply to a gene product, but an experiment, sequence analysis, etc. demonstrates otherwise.
'''Examples'''
#When there is conflicting experimental findings in the literature and curators would like to accurately capture all relevant data.
* '''MNN4 (CGD:CAL0000174110) NOT biological process involved in interspecies interaction between organisms''' from PMID:15271989, based on the result that loss of cell wall mannosylphosphate in Candida albicans does not influence macrophage recognition. This NOT annotation means that MNN4 is never involved in any biological process involved in interspecies interaction between organisms, which the data from the paper does not allow to conclude.
 
Use of the NOT qualifier is particularly important in cases where associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method). For example, if a protein has sequence similarity to an enzyme (whose activity is represented as Molecular Function GO:nnnnnnn), but has been shown experimentally not to have the enzymatic activity, it can be annotated as NOT GO:nnnnnnn.
 
In phylogenetic-based annotation, i.e. PAINT, the NOT qualifier is used in conjunction with the [http://wiki.geneontology.org/index.php/Inferred_from_Key_Residues_(IKR) IKR (Inferred from Key Residue)] evidence code.  Here, NOT is used to annotate a gene product when, although homologous to a particular protein family, it has lost essential residues and is very unlikely to be able to carry out an associated function, participate in the expected associated process, or be found in a certain location.
 
'''The NOT qualifier is not used to annotate negative or inconclusive experimental results.'''


==GO term==
==GO term==
A gene product can be annotated to zero or more terms from each ontology.
* A gene product can be annotated to zero or more terms from each ontology.
* Guidelines for annotating the different aspects of GO can be found in the [[Annotation#Ontology-Specific_Guidelines]] section.  
* Guidelines for certain specific topics are in the [[Annotation#Topic-Specific_Guidelines]] section.
* Guidelines for certain specific topics are in the [[Annotation#Topic-Specific_Guidelines]] section.
== Annotation Extensions ==
* Annotation extensions may be added to GO annotations to provide additional contextual information for the assertion.
* Annotation extensions are structured text that use a relation from the Relations Ontology and an appropriate biological concept or entity to modify the GO annotation, e.g. nucleus 'part of' epithelial cell.
* Detailed documentation on curation using annotation extensions can be found here: [[Annotation_Extension]]


==Evidence==
==Evidence==
Line 45: Line 49:
== Assigned_by ==
== Assigned_by ==
Every annotation is marked with the name of the group that made the annotation.  
Every annotation is marked with the name of the group that made the annotation.  
The group that made the annotation may be different from the database who manages the identifiers and/or the annotation file.  
The group that made the annotation may be different from the database who manages the identifiers and/or the annotation file.
 
= Avoiding redundancy=
Where two or more databases are submitting data on the same species we encourage the model whereby one database group collects all annotation data for that species, removes the redundant (duplicate) annotations, and then submits the total dataset to the central repository. This ensures that no redundant annotations will appear in the master dataset. Please see the list of species and relevant database groups for more details. We understand that annotating groups will also wish to make their full dataset available to the public. For this purpose, the GO Consortium makes all of the individual datasets available from the GO website, via the GO web CVS interface, or from the directory go/gene-associations/ in the GO CVS repository. All of the individual datasets are also listed in the annotation downloads table, and all individual groups will clearly be given credit for the work that they have done. The non-redundant set is only used as the master copy that appears in AmiGO and similar tools.
 
 
=No single established database?=
Some model species research communities do not have an established database group with funding and time to commit to long-term maintenance of their datasets. Such groups can contribute annotations to the central repository via the UniProtKB GO Annotation (UniProtKB-GOA) multispecies annotation group. This is also a possible route for those groups just starting out in annotation who may wish to take up the responsibility for long-term maintenance of their datasets at a later date.
 
 
=Annotating gene products that interact with other organisms=
The majority of gene products act within the organism that encoded them. However, sometimes gene products encoded by one organism can act on or in other organisms. For example, in obligate parasitic species (including viruses), almost all their gene products will be interacting with their host organism. Interactions may also be between organisms of the same species: for example, the proteins used by bacteria to adhere to one another to form a biofilm. For annotating gene products involved in these multi-organism interactions, there are special terms in the biological process ontology, under multi-organism process ; GO:0051704, and in the cellular component ontology, under other organism ; GO:0044215. More specific information can be found in the biological process documentation on multi-organism processes and in the cellular component guidelines on host cell. The species in the interaction are recorded in an annotation by using terms from this node and entering two taxon IDs in the Taxon column. The first taxon ID should be that of the species encoding the gene product, and the second should be the taxon of the other species in the interaction. Where the interaction is between organisms of the same species, both taxon IDs should be the same. The taxon column of the annotation file is described in more detail in the annotation file format guide. An additional taxon ID should not be added in cases where the annotation is based on sequence or structural similarity.
 
== Nomenclature Conventions==
* The terms 'symbiont' and 'host' may carry connotations of the nature of the interaction between two organisms, but in the Gene Ontology, they are used solely to differentiate between organisms on the basis of their size. The word symbiont is used to refer to the smaller organism in a symbiotic interaction; the larger organism is called the host. If the two organisms are the same size, the term will be contain other organism. Note that parasites and pathogens are also referred to as 'symbionts', as symbiosis encompasses parasitism, commensalism and mutualism.
==Requesting new terms in the multi-organism process node==
* Like the rest of GO, the multi-organism process node is not complete, and you will probably have to request some new terms when annotating your gene products. These should be submitted via the GO curator requests tracker in the usual way. Here are a few points to bear in mind when requesting new terms, and annotating using this node:
* A term name should make the direction of the interaction clear. An example of this is given below; induction of nodule morphogenesis in host would be used to annotate the symbiont gene product, while induction of nodule morphogenesis by symbiont is used to annotate the host genes. Both processes would be children of a common term nodulation.
* If your gene product affects a 'normal' host process, you should always request a new term in the MOP node, rather than just annotating directly to the term in the 'normal' ontology. So for example, if your bacterial gene product regulates the ethylene-mediated signaling pathway in plants, rather than using dual taxon to annotate to regulation of ethylene mediated signaling pathway ; GO:0010104, you should instead request a new term regulation of ethylene mediated signaling pathway in host.
* Where an organism subverts a 'normal' biological process, e.g. the transcription of viral DNA by host transcription machinery, host proteins should not be annotated to a 'symbiont' term like transcription of symbiont DNA. This is because this would be considered considered a pathological process, i.e. not 'normal' for the host.
* Example: Performing a process with another organism
** Nod factor export proteins transfer nod factors out of the purple bacterium Sinorhizobium meliloti into the surrounding soil. Here they are detected by LysM nod factor receptor kinases in Medicago truncatula roots and initiate the process of nodulation. Annotation of Nod factor export ATP-binding protein I from S. meliloti suggest a new term induction of nodule morphogenesis in host
    nodulation ; GO:0009877 [p] induction of nodule morphogenesis in host ; GO:00new01
    Sinorhizobium meliloti taxonomy ID: 382 Medicago truncatula taxonomy ID: 3880
    protein name: Nod factor export ATP-binding protein I GO term: induction of nodule morphogenesis in host ; GO:00new01 taxon column: taxon:382|taxon:3880
 
Annotation of LysM receptor kinase LYK3 precursor from M. truncatula suggest a new term induction of nodule morphogenesis by symbiont
    nodulation ; GO:0009877 [p] induction of nodule morphogenesis by symbiont ; GO:00new02
    Medicago truncatula taxonomy ID: 3880 Sinorhizobium meliloti taxonomy ID: 382
    protein name: LysM receptor kinase LYK3 precursor GO term: induction of nodule morphogenesis by symbiont ; GO:00new02 taxon column: taxon:3880|taxon:382
 
* Example: Performing a process in more than one species
** The protein cardiotoxin from the southern Indonesian spitting cobra Naja sputatrix kills mammalian cells by cytolysis when it enters the host cell cytoplasm. Annotation of cardiotoxin precursor, from N. sputatrix use the GO terms cytolysis of cells of another organism ; GO:0051715 and host cell cytoplasm ; GO:0030430 Naja sputatrix taxonomy ID: 33626 Mammalia taxonomy ID: 40674
 
            protein name: cardiotoxin precursor GO term: cytolysis of cells of another organism ; GO:0051715 taxon column: taxon:33626|taxon:40674 protein name: cardiotoxin precursor GO term: host cell cytoplasm ; GO:0030430 taxon column: taxon:33626|taxon:40674
 
* Example: Regulating a process in another organism
Mosquito saliva contains D7 proteins, which bind biogenic amines in order to suppress hemostasis in humans. Annotation of D7 protein long form, from A. gambiae suggest a new term negative regulation of hemostasis in host
 
    evasion of host defense response ; GO:0030682 [i] negative regulation of hemeostasis in host ; GO:00new03
    Anopheles gambiae taxonomy ID: 7165 Homo sapiens taxonomy ID: 9606
    protein name: D7 protein long form GO term: negative regulation of hemeostasis in host ; GO:00new03 taxon column: taxon:7165|taxon:9606
 


== Date ==
The date the annotation was made or last edited, in YYYYMMDD format.


==Old wiki pages to review ==
== Review Status ==
*[[Beginning_Annotation_SOP]]
* [[Top level tree]] - This tree leads to all others. - Jennifer
* [[Electronic - Harold and Evelyn]]
* [[ISS - Pascale]] and Michelle
* [[Manual - Jennifer]]
* [[user categories]]
* [[Beginning Annotation SOP]]


Last reviewed: 2022-04-19




[[Category: Annotation]]
[[Category: Annotation]]

Latest revision as of 08:17, 28 April 2022

This page describes the different annotation fields.

Elements of an annotation

Annotation Subject

  • Annotations subjects consists of valid database identifiers, such as WB:WBGene00003721, SGD:S000001048, or UniProtKB:P99999.
  • Annotations subjects may be genes or gene products (e.g. proteins, including specific isoforms; ncRNAs; and protein complexes)
  • The list of valid database prefixes can be found on the GO website.

Relations

  • Annotation Subjects and GO terms are linked by a Relation from the Relations Ontology.
  • The relations applicable to each aspect of GO as well as usage guidelines can be found in the Annotation_Relations page.

Negation

  • The NOT statement indicates that the gene product does not enable a Molecular Function, is not part of a Biological Process or is not located in a specific Cellular Component.
  • NOT statements are only used when a user might expect that the gene product would have a specific biological property (MF, BP or CC).
  • Contrary to positive annotations, NOT statements propagate down the ontology, such that the annotation gene product NOT enables protein kinase activity means that the gene product does not enable protein serine/threonine kinase activity or protein tyrosine kinase activity either.
  • Both positive and NOT statements can be used in cases where there is conflicting experimental findings in the literature.
  • If an isoform has a different function from the main isoform represented by the gene-centric entity, a NOT annotation can be captured together with the isoform identifier.
  • NOT annotations can be supported by experiments that show the lack of activity (or BP, CC), or based on sequence data that it has lost essential residues and is unlikely to be able to carry out a function, participate in a process, or be found in a certain location. In this case the evidence code is IKR (Inferred from Key Residue) evidence code.

When NOT to use negation

  • The NOT qualifier should be not used to capture every experimental result.
    • For example in a subcellular localization experiment, locations where the gene product is not found should not be captured, unless it is explicitly needed. If a protein is found in the nucleus but not in the mitochondrion, usually a NOT located in mitochondrion annotation would be inappropriate,
  • The NOT qualifier should be not used to annotate negative or inconclusive experimental results.
    • For example, if a mutant develops a specific anatomical structure, it doesn't imply that it does not contribute to the process; the experiment may just not allow to make the positive conclusion.
  • The NOT qualifier should be not used to describe experimental conditions or specific contexts in which the gene product is not active (i.e, should not be used in combination with an extension).

Examples

  • MNN4 (CGD:CAL0000174110) NOT biological process involved in interspecies interaction between organisms from PMID:15271989, based on the result that loss of cell wall mannosylphosphate in Candida albicans does not influence macrophage recognition. This NOT annotation means that MNN4 is never involved in any biological process involved in interspecies interaction between organisms, which the data from the paper does not allow to conclude.

GO term

Annotation Extensions

  • Annotation extensions may be added to GO annotations to provide additional contextual information for the assertion.
  • Annotation extensions are structured text that use a relation from the Relations Ontology and an appropriate biological concept or entity to modify the GO annotation, e.g. nucleus 'part of' epithelial cell.
  • Detailed documentation on curation using annotation extensions can be found here: Annotation_Extension

Evidence

Reference

Assigned_by

Every annotation is marked with the name of the group that made the annotation. The group that made the annotation may be different from the database who manages the identifiers and/or the annotation file.

Date

The date the annotation was made or last edited, in YYYYMMDD format.

Review Status

Last reviewed: 2022-04-19