Guidelines for literature-based curation

From GO Wiki
Literature based/Manual GO annotations

Annotation is the process of assigning GO terms to gene products. Annotations can be made either from published literature or manually by comparing sequences or can be inferred using automated methods. Literature based annotations are from published papers and inferring annotations from them is very time-consuming but produces very high quality, species-specific annotation, and brings the information about the gene product into a format in which it can be used in high-throughput experiments. This is an extremely worthwhile process in the long term. It may be best carried out by people who know the function of the gene product, and the associated biology, in great detail; for example experimental scientists who are familiar with the published literature. If you are doing this, then you may like to write and suggest modifications to the ontology structure as well.

Steps involved in Literature Based curation

Key aspects of literature based annotations

The most difficult task in doing literature based annotations is in figuring out the correct GO term and the evidence code to use in the annotation. The GOC has identified some areas of the ontology that are more tricky to figure out than others and has come up with guidelines to help curators map the data/results presented in the paper to the right GO term. They are listed below.

Guidelines for Annotating to Downstream Processes

Guidelines for annotating using 'Binding' terms

Guidelines for annotating using 'response to xx' terms

  • Proposed changed in def:
    • Changing the state or activity of a cell or organism in reaction to a stimulus.
    • The process of changing the state or activity of a cell or organism in reaction to/in response to a stimulus.
    • Any process that results in a change in state or activity of a cell or organism in reaction to a stimulus/ as the result of a stimulus (this was the one proposed at the camp)
    • Reacting to a stimulus to change the state or activity of a cell or organism.

Guidelines for annotating 'Response to' terms (Pascale 22 March 2011)

The generic definition of 'Response to' terms is Generic definition: 'A change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of XXX.'

There are two general ways a gene product can be involved in the 'response to' a stimulus:

(a) a gene product's expression level may be found to change in response to a stimulus

(b) a gene product may be found to *mediate* change in response to a stimulus, or mediate a more specific part of the response

For (a), this is almost always 'IEP'- types of experiments.

For (b), it is almost always not-'IEP' - types of experiments. ---

Annotation to 'response to' when a gene product has been upregulated by a stimulus means that the researcher expects that this change in expression is important for the response. For example, in response to heat shock, chaperones are upregulated that then mediate refolding of proteins damaged by the heat shock stress. An annotation to 'heat shock response' when a protein is upregulated by heat shock (=IEP) means that we infer that this change in expression has some biological role in the response. For example in this case a more precise annotation would be to a term describing 'protein folding in response to heat shock'

The reorganization of the 'Signaling' nodes has resulted in many 'response to' terms becoming related to signaling pathways and makes that type of more precise annotation easier.

When an annotator reads a paper describing an experiment where a 'response to term' seems appropriate, s/he should: 1. Try to annotate to the most granular term possible. 2. Use IEP if the 'response to' annotation is based on data where protein or RNA levels are compared

The general definition of the "response to" terms is "A change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus," where a stimulus is any conceivable stimulus a living organism might face, for example, "a stimulus from a yeast species," "a stimulus by molecules of oomycetes origin," "a misfolded protein stimulus," "an inactivity stimulus," "a stimulus indicating the organism is under stress," or "a nutrient stimulus." This definition is incredibly broad, with no temporal limitation, and with the allowance of basically any measurable change following a stimulus as being part of the response to the stimulus. The definition provides no requirement for proof of cause and effect and no requirement for distinguishing direct responses from indirect responses. Some annotators may thus interpret these "response to X" definitions to annotate any gene that changes in its expression level in the presence of X to "response to X." This is despite the fact that many stimuli produce global changes in mRNA or protein levels that are only indirectly connected to the stimulus through cascading cause and effect relationships. In contrast, more specific terms such as "response to dietary excess", defined as "The physiological process by which dietary excess is sensed by the central nervous system and results in a reduction in food intake and increased energy expenditure" are much more concerned with the mechanism of response, not the products of response. With that in mind, the present guidelines recommend that high level 'Response to' terms *should not* be used *directly *for annotation. This includes the following terms:

• GO:0050896 : response to stimulus

• GO:0051716 : cellular response to stimulus

• GO:0009628 : response to abiotic stimulus

• GO:0009607 : response to biotic stimulus

• GO:0042221 : response to chemical stimulus

• GO:0009719 : response to endogenous stimulus

• GO:0009605 : response to external stimulus

• GO:0006950 : response to stress

• GO:0048585 : negative regulation of response to stimulus

• GO:0048584 : positive regulation of response to stimulus

• GO:0048583 : regulation of response to stimulus

Details of the discussion leading to these guidelines are here:

Guidelines for annotating using 'regulation of xx' terms

Manual Sequence based annotations

Molecular Function Annotations

annotating to enzymatic activities described for non-physiological substrates 1.there is a precedent/unwritten rule for making the broader annotation when the assay is known not to be the physiological substate. For many enzyme assays, curator judgement is required to determine where you place the annotation. A common example, demonstrating histone kinase activity in vitro is often used as a serine/threonine protein kinase activity assay, but you would be unlikely annotate to the child term histone kinase activity without some other contextual information. (Val) Edimmer 13:32, 25 January 2011 (UTC)