Elements of an annotation: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 17: Line 17:




=Using the Qualifier column=
=Using the Qualifier=
The Qualifier column is used for flags that modify the interpretation of an annotation. Allowable values are <code>NOT</code>, <code>contributes_to</code>, and <code>colocalizes_with</code>.
The Qualifier is used for text that modifies the association of a gene or gene product with a GO term. Allowable values are <code>NOT</code>, <code>contributes_to</code>, and <code>colocalizes_with</code>.


==NOT==
==NOT==
'''Using the Qualifier column'''
The Qualifier column is used for text that modifies the association of a gene or gene product with a GO term. Allowable values are NOT, contributes_to, and colocalizes_with.
'''NOT'''


*NOT is used to make an annotation statement that the gene product is not associated with the GO term.   
*NOT is used to make an annotation statement that the gene product is not associated with the GO term.   
Line 57: Line 52:
** ATP citrate lyase (ACL) in Arabidopsis: it is a heterooctamer, composed of two types of subunits, ACLA and ACLB in a A(4)B(4) stoichiometry. Neither of the subunits expressed alone give ACL activity, but co-expression results in ACL activity. Both subunits can be annotated to ATP citrate lyase activity.
** ATP citrate lyase (ACL) in Arabidopsis: it is a heterooctamer, composed of two types of subunits, ACLA and ACLB in a A(4)B(4) stoichiometry. Neither of the subunits expressed alone give ACL activity, but co-expression results in ACL activity. Both subunits can be annotated to ATP citrate lyase activity.
** eIF2: has three subunits (alpha, beta, gamma); one binds GTP; one binds RNA; the whole complex binds the ribosome (all three subunits are required for ribosome binding). So one subunit is annotated to GTP binding and one to RNA binding without qualifiers, and all three are annotated to ribosome binding, with the contributes_to qualifier. And all three are annotated to the component term for eIF2 complex.
** eIF2: has three subunits (alpha, beta, gamma); one binds GTP; one binds RNA; the whole complex binds the ribosome (all three subunits are required for ribosome binding). So one subunit is annotated to GTP binding and one to RNA binding without qualifiers, and all three are annotated to ribosome binding, with the contributes_to qualifier. And all three are annotated to the component term for eIF2 complex.


= Long-term maintenance of annotation datasets=  
= Long-term maintenance of annotation datasets=  

Revision as of 11:07, 7 March 2019

  From http://geneontology.org/page/go-annotation-conventions
  TO BE REVIEWED


Elements of an annotation

Annotation Subject

  • Annotations subjects consists of valid database identifiers, such as WB:WBGene00003721 or SGD:S000001048.
  • The list of valid database prefixes can be found on the GO website.

Relations and qualifiers

References and Evidence

  • Every annotation must be attributed to a source, which may be a literature reference, another database or a computational analysis. The annotation must indicate what kind of evidence is found in the cited source to support the association between the gene product and the GO term. A simple controlled vocabulary of evidence codes is used to capture this; please see the GO evidence code documentation for more information on the meaning and use of the evidence codes.


Using the Qualifier

The Qualifier is used for text that modifies the association of a gene or gene product with a GO term. Allowable values are NOT, contributes_to, and colocalizes_with.

NOT

  • NOT is used to make an annotation statement that the gene product is not associated with the GO term.
  • When combined with an explicit annotation relation, e.g. enables, the NOT qualifier indicates that the gene product does not have that relationship to the GO term.
  • NOT may be used with terms from any of the three ontologies.

In practice, the NOT qualifier is used in two ways:

  1. When a GO term might otherwise be expected to apply to a gene product, but an experiment, sequence analysis, etc. demonstrates otherwise.
  2. When there is conflicting experimental findings in the literature and curators would like to accurately capture all relevant data.

Use of the NOT qualifier is particularly important in cases where associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method). For example, if a protein has sequence similarity to an enzyme (whose activity is represented as Molecular Function GO:nnnnnnn), but has been shown experimentally not to have the enzymatic activity, it can be annotated as NOT GO:nnnnnnn.

In phylogenetic-based annotation, i.e. PAINT, the NOT qualifier is used in conjunction with the IKR (Inferred from Key Residue) evidence code. Here, NOT is used to annotate a gene product when, although homologous to a particular protein family, it has lost essential residues and is very unlikely to be able to carry out an associated function, participate in the expected associated process, or be found in a certain location.

The NOT qualifier is not used to annotate negative or inconclusive experimental results.

colocalizes_with

  • colocalizes_with may be used only with cellular component terms.
  • Gene products that are transiently or peripherally associated with an organelle or complex may be annotated to the relevant cellular component term, using the colocalizes_with qualifier. This qualifier may also be used in cases where the resolution of an assay is not accurate enough to say that the gene product is a bona fide component member. Example (from Schizosaccharomyces pombe): Clp1p relocalizes from the nucleolus to the spindle and site of cell division; i.e. it is associated transiently with the spindle pole body and the contractile ring (evidence from GFP fusion). Clp1p is annotated to spindle pole body ; GO:0005816 and contractile ring ; GO:0005826, using the colocalizes_with qualifier in both cases.

contributes_to

  • contributes_to may be used only with molecular function terms.
  • As noted above, an individual gene product that is part of a complex can be annotated to terms that describe the function of the complex. Many such function annotations should use the qualifier contributes_to: Annotating individual gene products according to attributes of a complex is especially useful for molecular function annotations in cases where a complex has an activity, but not all of the individual subunits do. (For example, there may be a known catalytic subunit and one or more additional subunits, or the activity may only be present when the complex is assembled.) Molecular function annotations of complex subunits that are not known to possess the activity of the complex must include the entry contributes_to in the Qualifier column. The contributes_to qualifier should not be used in biological process annotations. All gene products annotated using contributes_to must also be annotated to a cellular component term representing the complex that possesses the activity. Annotations using contributes_to will often use the evidence code IC, but other codes may be used as well. Note that contributes_to is not needed to annotate a catalytic subunit. Furthermore, contributes_to may be used for any non-catalytic subunit, whether the subunit is essential for the activity of the complex or not.
  • Examples
    • Subunits of nuclear RNA polymerases: none of the individual subunits have RNA polymerase activity, yet all of these subunits are annotated to DNA-dependent RNA polymerase activity (with the contributes_to note), to capture the activity of the complex.
    • ATP citrate lyase (ACL) in Arabidopsis: it is a heterooctamer, composed of two types of subunits, ACLA and ACLB in a A(4)B(4) stoichiometry. Neither of the subunits expressed alone give ACL activity, but co-expression results in ACL activity. Both subunits can be annotated to ATP citrate lyase activity.
    • eIF2: has three subunits (alpha, beta, gamma); one binds GTP; one binds RNA; the whole complex binds the ribosome (all three subunits are required for ribosome binding). So one subunit is annotated to GTP binding and one to RNA binding without qualifiers, and all three are annotated to ribosome binding, with the contributes_to qualifier. And all three are annotated to the component term for eIF2 complex.

Long-term maintenance of annotation datasets

Annotation is carried out by curators in a range of bioinformatics database resource groups, such as Mouse Genome Informatics, Saccharomyces Genome Database and FlyBase. These groups then contribute their data to the central GO repository for storage and redistribution. After submission, the annotating groups may retain responsibility for updating the annotation data to take account of changes in annotation practices and in the structure of the ontologies. This is an ongoing responsibility. For groups who prefer not to maintain their annotation dataset in the long term, it is possible to submit data to the GO repository via another database group, which will undertake to maintain the data long-term.

Avoiding redundancy

Where two or more databases are submitting data on the same species we encourage the model whereby one database group collects all annotation data for that species, removes the redundant (duplicate) annotations, and then submits the total dataset to the central repository. This ensures that no redundant annotations will appear in the master dataset. Please see the list of species and relevant database groups for more details. We understand that annotating groups will also wish to make their full dataset available to the public. For this purpose, the GO Consortium makes all of the individual datasets available from the GO website, via the GO web CVS interface, or from the directory go/gene-associations/ in the GO CVS repository. All of the individual datasets are also listed in the annotation downloads table, and all individual groups will clearly be given credit for the work that they have done. The non-redundant set is only used as the master copy that appears in AmiGO and similar tools.

Credit for annotation work

Every annotation is marked with the name of the database that made the annotation as well as the name of the database that maintains and submits the annotation. This information is in two separate columns of the gene association file. This ensures that the database making the annotation, and the database maintaining the annotation, will both receive full credit for their work.

No single established database?

Some model species research communities do not have an established database group with funding and time to commit to long-term maintenance of their datasets. Such groups can contribute annotations to the central repository via the UniProtKB GO Annotation (UniProtKB-GOA) multispecies annotation group. This is also a possible route for those groups just starting out in annotation who may wish to take up the responsibility for long-term maintenance of their datasets at a later date.


Annotating gene products that interact with other organisms

The majority of gene products act within the organism that encoded them. However, sometimes gene products encoded by one organism can act on or in other organisms. For example, in obligate parasitic species (including viruses), almost all their gene products will be interacting with their host organism. Interactions may also be between organisms of the same species: for example, the proteins used by bacteria to adhere to one another to form a biofilm. For annotating gene products involved in these multi-organism interactions, there are special terms in the biological process ontology, under multi-organism process ; GO:0051704, and in the cellular component ontology, under other organism ; GO:0044215. More specific information can be found in the biological process documentation on multi-organism processes and in the cellular component guidelines on host cell. The species in the interaction are recorded in an annotation by using terms from this node and entering two taxon IDs in the Taxon column. The first taxon ID should be that of the species encoding the gene product, and the second should be the taxon of the other species in the interaction. Where the interaction is between organisms of the same species, both taxon IDs should be the same. The taxon column of the annotation file is described in more detail in the annotation file format guide. An additional taxon ID should not be added in cases where the annotation is based on sequence or structural similarity.

Nomenclature Conventions

  • The terms 'symbiont' and 'host' may carry connotations of the nature of the interaction between two organisms, but in the Gene Ontology, they are used solely to differentiate between organisms on the basis of their size. The word symbiont is used to refer to the smaller organism in a symbiotic interaction; the larger organism is called the host. If the two organisms are the same size, the term will be contain other organism. Note that parasites and pathogens are also referred to as 'symbionts', as symbiosis encompasses parasitism, commensalism and mutualism.

Requesting new terms in the multi-organism process node

  • Like the rest of GO, the multi-organism process node is not complete, and you will probably have to request some new terms when annotating your gene products. These should be submitted via the GO curator requests tracker in the usual way. Here are a few points to bear in mind when requesting new terms, and annotating using this node:
  • A term name should make the direction of the interaction clear. An example of this is given below; induction of nodule morphogenesis in host would be used to annotate the symbiont gene product, while induction of nodule morphogenesis by symbiont is used to annotate the host genes. Both processes would be children of a common term nodulation.
  • If your gene product affects a 'normal' host process, you should always request a new term in the MOP node, rather than just annotating directly to the term in the 'normal' ontology. So for example, if your bacterial gene product regulates the ethylene-mediated signaling pathway in plants, rather than using dual taxon to annotate to regulation of ethylene mediated signaling pathway ; GO:0010104, you should instead request a new term regulation of ethylene mediated signaling pathway in host.
  • Where an organism subverts a 'normal' biological process, e.g. the transcription of viral DNA by host transcription machinery, host proteins should not be annotated to a 'symbiont' term like transcription of symbiont DNA. This is because this would be considered considered a pathological process, i.e. not 'normal' for the host.
  • Example: Performing a process with another organism
    • Nod factor export proteins transfer nod factors out of the purple bacterium Sinorhizobium meliloti into the surrounding soil. Here they are detected by LysM nod factor receptor kinases in Medicago truncatula roots and initiate the process of nodulation. Annotation of Nod factor export ATP-binding protein I from S. meliloti suggest a new term induction of nodule morphogenesis in host
   nodulation ; GO:0009877 [p] induction of nodule morphogenesis in host ; GO:00new01
   Sinorhizobium meliloti taxonomy ID: 382 Medicago truncatula taxonomy ID: 3880
   protein name: Nod factor export ATP-binding protein I GO term: induction of nodule morphogenesis in host ; GO:00new01 taxon column: taxon:382|taxon:3880

Annotation of LysM receptor kinase LYK3 precursor from M. truncatula suggest a new term induction of nodule morphogenesis by symbiont

   nodulation ; GO:0009877 [p] induction of nodule morphogenesis by symbiont ; GO:00new02
   Medicago truncatula taxonomy ID: 3880 Sinorhizobium meliloti taxonomy ID: 382
   protein name: LysM receptor kinase LYK3 precursor GO term: induction of nodule morphogenesis by symbiont ; GO:00new02 taxon column: taxon:3880|taxon:382
  • Example: Performing a process in more than one species
    • The protein cardiotoxin from the southern Indonesian spitting cobra Naja sputatrix kills mammalian cells by cytolysis when it enters the host cell cytoplasm. Annotation of cardiotoxin precursor, from N. sputatrix use the GO terms cytolysis of cells of another organism ; GO:0051715 and host cell cytoplasm ; GO:0030430 Naja sputatrix taxonomy ID: 33626 Mammalia taxonomy ID: 40674
           protein name: cardiotoxin precursor GO term: cytolysis of cells of another organism ; GO:0051715 taxon column: taxon:33626|taxon:40674 protein name: cardiotoxin precursor GO term: host cell cytoplasm ; GO:0030430 taxon column: taxon:33626|taxon:40674
  • Example: Regulating a process in another organism

Mosquito saliva contains D7 proteins, which bind biogenic amines in order to suppress hemostasis in humans. Annotation of D7 protein long form, from A. gambiae suggest a new term negative regulation of hemostasis in host

    evasion of host defense response ; GO:0030682 [i] negative regulation of hemeostasis in host ; GO:00new03
    Anopheles gambiae taxonomy ID: 7165 Homo sapiens taxonomy ID: 9606
    protein name: D7 protein long form GO term: negative regulation of hemeostasis in host ; GO:00new03 taxon column: taxon:7165|taxon:9606

Downstream Process guidelines

  • Where there is limited knowledge regarding the processes that a gene product is directly involved in, curators may often have annotated to terms that describe the processes that are downstream of the direct activity of the gene product. Where more knowledge regarding a gene product's functional activity exists, curators need to make a judgement as to how to represent its direct activities and whether to continue to include downstream processes in the annotation set. Curators are encouraged to request more specific terms to describe how the gene product is involved in a downstream process and also evaluate the annotation set as more functional information becomes available. More detailed curator guidance is provided below.

Requesting more specific terms for downstream processes

  • Where a specific, descriptive GO term does not exist (for instance to describe the involvement of a process in another process), curators are encouraged to request these terms to provide more specificity to their annotation. For example, to describing the "intent" of growth factor BMP2 to change the "state" of the cell is instrumental in cardiac cell differentiation. Therefore requesting the new GO term BMP signaling involved in cardiac cell differentiation would make it possible to qualify how the gene product is involved in the downstream process of cardiac cell differentiation than annotating to separate terms BMP signaling and cardiac cell differentiation.

Annotating downstream processes for gene products involved in core or specific processes

  • Curators should annotate to the experimental evidence in the paper. However, curator judgement should be used, taking into account what the curator knows about:
  • The background of the gene product; is it widely known to have a central role causing it to affect multiple processes, or does it have few specific targets?
  • the quality of the experimental assays performed in the paper; are they fully explained and the evidence supplied convincing? (See separate guidelines for annotation of high-throughput experiments.)
  • Example 1. Gene product involved in core process. Yeast RNA polymerase II subunit RPB2 RNA polymerase II subunit RPB2 has a core function of RNA polymerase activity, which has downstream effects on a large number of processes. However, curators should only annotate to the gene product's transcription activity, rather than the multiple downstream processes altered as a consequence of its activity. Yeast spliceosome In S. cerevisiae, the mutation of several genes that are components of the spliceosome result in translation defects. However, later work supplied evidence for the genes' involvement in mRNA splicing, not translation. Downstream effects on translation are to be expected as many ribosomal transcripts are spliced in yeast. The curation decision was to remove annotations to the term translation for spliceosome component genes once data was available to describe the direct activity the genes contributed towards. Example 2. Gene product involved in core and specific process(es). S. pombe gene Sre1 The S. pombe gene Sre1 is a transcriptional regulator of genes that are involved in heme and phosphoplipid biosynthesis. From reading PMID:16537923 the curator decided this information should be captured in the annotation. Therefore annotations were made to:
       RNA polymerase II core promoter proximal region sequence-specific DNA binding
       regulation of transcription, DNA-dependent or regulation of transcription from RNA polymerase II promoter
       positive regulation of heme biosynthesis
       positive regulation of phospholipid biosynthesis

In addition, in accordance with these guidelines for annotating downstream processes, we would recommend that new terms are requested for:

       regulation of transcription involved in heme biosynthesis
       regulation of transcription involved in phospholipid biosynthesis

Annotating downstream processes to gene products in a ligand-receptor signaling pathway

  • Curators should anotate ligand-receptor signaling pathways as shown in the following diagrams. For a signaling pathway, the ligand is considered part of the pathway. Therefore a factor which limits or increases the availability of a ligand to a receptor should be annotated as regulating the ligand/receptor pathway. N.b. Ongoing work to clarify of the start/end of a signaling pathway in the definition of GO terms will allow us to refine these guidelines.

General ligand-receptor pathway

  

Stimulus

  • regulation of signaling pathway

Ligand

  • signaling pathway
  • regulation of other cellular processes

Receptor

  • signaling pathway
  • regulation of other cellular processes

Signaling molecules

  • signaling pathway
  • regulation of other process(es)
  • regulation of gene-specific transcription
  • regulation of translation
  • (regulation of) transcription in response to stimulus ligand
  • (regulation of) transcription involved in other process(es)
  • (regulation of ) other cellular process(es)

Transcription factors*

  • signaling pathway
  • regulation of transcription involved in other process(es)

Target

  • cellular response to stimulus
  • other process(es)
  • regulation of other processes
  • We would not consider annotating the core transcription machinery to the downstream (other) processes that the target is involved in unless the transcription factor is gene-specific, in which case we would annotate to regulation of transcription involved in other process(es)
   Regulation of glucose transport
   

Insulin (ligand)

  • insulin receptor signaling pathway
  • regulation of glucose transport/homeostasis

Insulin receptor (receptor)

  • Insulin receptor signaling pathway
  • Regulation of glucose transport/homeostasis

IRS1, PI3K, PDK1, PKC (signaling molecules)

  • Insulin receptor signaling pathway
  • Regulation of glucose transport/homeostasis
  • Protein localization at cell surface (NTR: involved in response to insulin)

GLUT4 (target)

  • Cellular response to insulin
  • Glucose transport/homeostasis

General note on current status of revision of annotation sets

  • If a gene product has limited experimental literature, such as a newly characterised protein, it is understandable that curators need to annotate to more general 'downstream' process terms that may, in fact, represent a phenotype. However, as more functional information is published about a gene product, curators may decide to revise these annotations to downstream processes. However currently different actions are taken by different curation groups, based on considerations of user requirements and curation capacity:
  • Annotations may be removed to indirect/downstream processes, or updated to 'regulation' terms. This 'deleted' information is usually stored in the annotating group's phenotype database.
  • Annotations not removed to indirect/downstream processes because downstream annotations are supported by good evidence, or the group wants to keep as history of annotation or give a complete overview of knowledge about the gene product.
  • the curation group does not have resources to revise annotation sets or do not have alternative place to store data
  • Curation groups need to be aware that keep annotations to downstream processes will be a source of such data to other groups who may have a different annotation philosophy.

Binding guidelines

Using terms that imply binding of substrates

  • As many terms in the Molecular Function ontology implicitly or explicitly imply the binding of a chemical or protein, it is unnecessary to co-annotate a gene product to a term from the binding node of GO to describe the binding of substrates or products that are already adequately captured in the definition of the Molecular Function term. For instance, a protein with enzymatic activity MUST bind all of the substrates and products of the reaction it catalyzes. Similarly, a protein with transporter activity MUST bind the molecules it transports. The curator should try to capture the specifics as much as feasible and avoid redundant annotations. Annotate to a binding term whenever an experiment shows binding, but not catalysis/transport. Curators should use their judgment to decide whether the interaction is physiologically relevant and capture information relevant to the in vivo situation.

Protein binding annotations in the Gene Ontology

  • The Molecular Function (MF) ontology can be used to capture macromolecular interactions, such as protein-protein, protein-nucleic acid, protein-lipid interactions, etc. While GO annotations are not considered to be a repository of all protein-protein interactions, many gene products are annotated to 'protein binding' (GO:0005515) or one of its child terms. In making these annotations, contributing groups may follow slightly different practices with respect to the types of experimental evidence used to support these inferences, e.g. some groups may use co-immunoprecipitation as supporting evidence for a protein binding annotation between two gene products, others not. However, all groups generally adhere to the principle that, when annotated, protein binding interactions inform what is believed to be the normal biological role of a gene product, i.e. the protein-protein interactions support an author's hypothesis about how the gene product is thought to execute its molecular function in the context of a normal biological process. Protein-protein interactions for which there is not yet sufficient biological context are discouraged as sources of GO MF annotations.

Choosing more descriptive terms than 'protein binding'

  • Child terms that describe a particular class of protein binding (e.g. GO:0030971:receptor tyrosine kinase binding) should be used in preference to the parent term GO:0005515 protein binding. The IPI evidence code should be used where possible for annotation of all protein-protein interactions and the precise identity of the interacting protein should be captured in the 'with' column (8). At present a variety of identifiers can be used in the 'with' column (8) or the annotation extension column (16), see GO Annotation File Format 2.0 Guide.

Identifying binding partners using columns 8 and 16

  • When a gene product is being annotated to a binding activity term, the 'with' column (8) and/or the annotation extension column (16) can be used to capture additional information about the identify of the binding partner of the gene product being annotated. To understand when to use column 8, column 16, or both, it is important to remember that entries in column 8 support the evidence used to infer the function, while entries in column 16 modify the GO term used in the GO_ID column (5). The curator also needs to remember that the 'with' column (8) can be used with only a subset of evidence codes: IPI, IC, IEA, IGI, IMP or ISS; column 8 cannot be used with an IDA evidence code, see evidence code documentation. Examples of using the 'with' column (8) The annotation of Protein A to a GO binding term with evidence code IPI and Protein B in the 'with' column (8) makes the statement that Protein A has the binding activity defined by the GO term and this function was inferred from interaction with Protein B; binding to Protein B isn't necessarily the in vivo function of Protein A.
  • Column 8 can be used to make annotations based on experiments where the evidence for the function of Protein A binding Protein B in species X is based on binding of protein B from species Y. For example, the C. elegans Unc-115 protein was shown to bind to actin filaments made with actin purified from rabbit skeletal muscle. This would be annotated as GO:0051015:actin filament binding using an IPI evidence code and putting an accession for rabbit skeletal muscle actin, UniProtKB:P68135, in the 'with' column (8). This annotation makes the statement that C. elegans Unc-115 has the molecular function of actin filament binding inferred from experiments using rabbit actin.
  • Column 8 can be used to indicate that the evidence for binding a small molecule is based on an experiment using an analog. The annotation Protein A GO:0005524:ATP binding IPI column 8 ATP-gamma-S captures the information that ATP binding activity was inferred from binding of a non-hydrolyzable ATP analog.
  • Examples of using the annotation extension column (16)
    • The annotation of Protein A to a GO function term with Protein B and a has_participant relationship in the annotation extension column (16) makes the statement that an in vivo target of Protein A is Protein B. This is equivalent to the post-compositional creation of a new child term.
    • The zebrafish Lnx2b protein (UnitProtKB:A4VCF7) was shown to ubiquitinate zebrafish Dharma (UniProtKB:O93236) in PMID:19668196. Therefore Lnx2b can be annotated to GO:0004842:ubiquitin-protein ligase activity adding has_input UniProtKB:O93236 in annotation extension column (16). This annotation makes the statement that Dharma is a substrate of the ubiquitin-protein ligase activity of Lnx2b.
    • The human ABCG1 protein has been annotated to GO:0034041 sterol-transporting ATPase activity with an IDA evidence code. The experiments in PMID:17408620, demonstrate that the target is 7-hydroxycholesterol; this information can be added to the annotation by including the ChEBI ID for 7-hydroxycholesterol, CHEBI:42989, in the annotation extension column (16): post-composing the GO term 7-hydroxycholesterol-transporting ATPase activity.

Ontology development for protein binding

Future ontology development efforts should be relied upon to improve the searching capability of any user who is specifically interested in gene products carrying out a certain type of substrate/product binding. Ongoing relevant ontology development of 'has_part' relationships will provide links to implied substrate binding (the GOC are developing 'has_part' relationships to implying substrate binding). The existing GO will follow this new format, e.g. Transcription factor activity will have a 'has_part' relationship to DNA binding rather than an 'is_a' relationship. Curators should request new 'has_part' relationships (and terms) if these do not exist.

'Response to' guidelines

The definition of the top-level 'response to' terms has been updated to indicate where the response begins and ends: Any process that results in a change in state or activity of a cell or organism as the result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism. This change was made and released in ontology version 1.1960

  • Examples:
    • response to stimulus ; GO:0050896 Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism
    • GO:0051716 cellular response to stimulus Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus by a cell and ends with a change in state or activity or the cell. Advisory quality control check: High level 'response to' terms should not directly be used for annotation, unless additional information is supplied in column 16. Be careful to use IEP when the experiment is observing expression level. Example: PMID:8888624 and annotation for A. thaliana BIP1. Should use IEP than IDA.

Use of Regulation Terms

Background

The GO Consortium recognized quite early on in the development of the Biological Process ontology that there were gene products that participated directly in a process and gene products that regulated a process, positively and/or negatively. But how do curators know to which of these terms they should be annotating and is it possible, for a given process, to annotate the same gene product to both a parent term and one of its associated regulation term? To begin to address these questions here are some guidelines for annotating, or not, to regulation terms:

Guideline 1: Use existing biological knowledge to define the process.

In order to determine whether a gene product participates in a process or regulates that process (or both) curators need to consider the nature of the process. Processes can be considered as ordered assemblies of molecular functions and every process has a beginning, middle, and end. Use existing biological knowledge and the paper being curated as guides. Is there a defined pathway, i.e. distinct molecular functions, and have the gene products that perform those functions been identified? Does the gene product being annotated perform one of those functions or a function outside of the process that might start, stop, or change the rate at which the process proceeds? In reality, the beginning, middle, and end of some processes will be easier to define than others. For example, signaling pathways, such as MAPK signaling, will be easier to define than broader, organismal-level processes such as embryonic development. Curators should use their jugdement, based on the published literature, to guide their annotation. Example: Atg1 Saccharomyces cerevisiae Atg1 encodes a protein kinase that is involved in autophagy: "The process by which cells digest parts of their own cytoplasm; allows for both recycling of macromolecular constituents under conditions of cellular stress and remodeling the intracellular structure for cell differentiation." Atg1 activity is critical for the induction of autophagy, specifically for formation of autophagic vacuoles. Should Atg1 be annotated to autophagic vacuole formation or regulation of autophagic vacuole formation? Authors have used language that could lead curators to make annotations to either term. In this case, annotators need to consider the sum of what is known about the autophagic pathway and Atg1's role in that pathway. Using that knowledge, SGD has annotated Atg1 to the parent process term, autophagic vacuole formation, because once Atg1 is active, the 'go' or 'no go' decision for autophagy has already been made. More upstream genes appear to actually be regulating the autophagic pathway. http://wiki.geneontology.org/index.php/2010_GO_camp_Use_of_Regulation_is...

Guideline 2: If you aren't sure, consider annotating to the parent process term.

If the gene product performs one of the functions, annotate directly to the process. If the gene product regulates then it should be annotated to regulation of that process. If you aren't sure what term to use, annotate to the parent process term. As more information about the process becomes available, you may be able to refine your annotations (see Guideline #4 below).

Guideline 3: Improve the ontology by defining, wherever possible, the beginning, middle, and end of a process.

  • Wherever possible, include the beginning, middle, and end of a process in the corresponding term definition. This will help annotators choose the appropriate term for their annotations.

Guideline 4: Revisit annotations when new knowledge becomes available.

GO annotations should reflect the present state of biological knowledge. Therefore, as the understanding of a biological process improves, it may be necessary to revisit and refine existing annotations.

Guideline 5: Annotations based on mutant phenotypes should take mechanism into account.

Mutant phenotypes are often used to make annotations to regulation terms because they fit the criteria of the term definition, i.e. authors report a change in the frequency, rate, or extent of a process. However, in using IMP to correctly make regulation annotations it is important to consider various factors, including: 1) the assay type, 2) nature of the alleles (null vs reduction of function), and 3) molecular identity of the gene product. Again, if it isn't clear that a gene product is involved in regulation, it is better to annotate to the parent process term. Example: muscle contraction and C. elegans mutants In C. elegans, a number of genes can mutate to paralysis or slowed locomotion due to defects in muscle contraction. This includes genes that encode everything from myosin heavy chain to calcium channels to transcription factors. Depending upon the nature of the allele, sometimes the mutant phenotypes for the same gene can lead to both process and regulation terms. In this case, consideration of the process, the nature of the allele (complete or partial loss of function), and the molecular identity of the gene product can guide curators in making the appropriate annotation. http://wiki.geneontology.org/images/4/47/Regulation_example.pdf

Guideline 6: Some gene products may be annotated to both a process and regulation of that process.

Positive and negative feedback loops are an essential part of many signaling pathways. If one member of a pathway regulates the activity of a different member of the pathway, it could be annotated to both the process and regulation of that process. When annotating gene products involved in a signaling pathway, however, curators should not annotate gene products that directly activate the next gene product in the pathway to regulation of that pathway. For example, MAPKK would not be annotated to positive regulation of MAPKKK cascade just because it phosphorylates and activates MAPK. However, gene products that (for example) feedback on to earlier steps in the pathway, may be annotated to both the parent process term and a regulation term. Example: ERK1/2 ERK1/2 activation requires activity of FRS2alpha which, in turn, is negatively regulated by activated ERK1/2. Could ERK1/2 be annotated to both MAPKKK cascade and negative regulation of MAPKKK cascade? Phosphoprotein Enriched in Astrocytes 15 kDa (PEA-15) Reprograms Growth Factor Signaling by Inhibiting Threonine Phosphorylation of Fibroblast Receptor Substrate 2{alpha} Cases where the presence/absence of one of the members of a pathway is limiting should not be annotated to regulation, e.g. if the amount of a receptor on the surface of a cell regulates the process, the receptor should not be annotated to the regulation term.


Old wiki pages to review