Response to x annotation guidelines

From GO Wiki
Jump to navigation Jump to search

The definition of the top-level 'response to' terms has been updated to indicate where the response begins and ends: Any process that results in a change in state or activity of a cell or organism as the result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism. This change was made and released in ontology version 1.1960

  • Examples:
    • response to stimulus ; GO:0050896 Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus and ends with a change in state or activity or the cell or organism
    • GO:0051716 cellular response to stimulus Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus. The process begins with detection of the stimulus by a cell and ends with a change in state or activity or the cell. Advisory quality control check: High level 'response to' terms should not directly be used for annotation, unless additional information is supplied in column 16. Be careful to use IEP when the experiment is observing expression level. Example: PMID:8888624 and annotation for A. thaliana BIP1. Should use IEP than IDA.

2012 working group notes on Response to terms

The following is some key points raised in a discussion between Alan Bridge, Pascale Gaudet, Rachael Huntley and Rama Balakrishnan during 2012.

Currently, the annotations to response to terms are really problematic, and much overused. In many cases, curators appears to make annotations to "response to X" when in fact they are really saying "responds to X". See also the Use_of_Response_To_Terms_in_Annotation page by Alex Diehl, which highlights the main problems with using these terms.

Alex's thoughts boil down to how curators want to use these terms; either 1) to annotate any gene product that changes in its expression level in the presence of X to "response to X.", thus yielding thousands of gene products annotated to response to terms that provide little added value and masking the gene products that are truly involved in a response or 2) to annotate gene products that are involved in the mechanism of a response, e.g. by using more informative terms, such as "response to dietary excess", which is defined in terms of the response mechanism.

In the second case, we would ideally like to annotate; "gene product X is required for (cellular, organismal) response Y to stimulus Z" and also be able to say what X, Y and Z are.

These annotations could be used in situations where you have definitive experimental evidence for a requirement for a given gene product X in a specific cellular/organismal response Y to a stimulus or perturbation Z (RNAi), but when you don't know the mechanism underlying this requirement (maybe synthetic lethal data tells you it's not in a known pathway). If you did know the mechanism you could (for instance) annotate with the correct biological process term.

In some cases, these terms seem to be used with another (complementary) term that is more specific, and which provides more biological meaning. For example, A2A259 is annotated to:

GO:0001581 - detection of chemical stimulus involved in sensory perception of sour taste - IDA PubMed 16891422

GO:0071468 - cellular response to acidity - IDA PubMed 16891422

The first annotation is interesting - this protein is detecting chemicals that tell us food is sour. The chemical used to test this is citric acid - which is sour of course, and whose acidic nature is I guess is the source for the second annotation. This annotation tells you nothing about the response though, it's really there to tell us sour things are acidic.

If the "response to" terms are to be restricted for use with gene products that are actively contributing to the mechanism of the response, i.e. the effect does not occur when the gene product is absent, then curators should first consider what the response or cellular response to the particular stimulus is likely to be. Is the gene product you are annotating shown experimentally to be required for that response to occur, i.e. does it mediate the effect that the stimulus has?

Hypothetical example;

Annotation of gene product 'X' to 'cellular response to nitric oxide'. The cellular response in this experiment is production of gene product 'Y' 'X' is required for this response because in the absence of 'X', no 'Y' is produced. Therefore it is acceptable to say that 'X' has a role in the cellular response to nitric oxide.

If we follow this rule, then we need;

1. To have in vivo experimental data for using a cellular response to term

2. To have a biological readout corresponding to the process being annotated to (cell differentiation, DNA damage, etc).

One possibility to avoid over-use of these terms is to create more descriptive "response to" term names and definitions.

GO:0072432 response to mitotic cell cycle G1/S transition DNA damage checkpoint signal - "A process that acts directly to delay or stop progression through the cell cycle in response to signals generated as a result of mitotic cell cycle G1/S transition DNA damage checkpoint signaling; contributes to a mitotic cell cycle G1/S transition DNA damage checkpoint."

For this example we could introduce more specific terms such as, 'DNA repair in response to mitotic cell cycle G1/S transition DNA damage checkpoint' and 'cell cycle arrest in response to mitotic cell cycle G1/S transition DNA damage checkpoint'. This would help curators to make sensible use of these terms.


An opinion by Alexander Diehl

The general definition of the "response to" terms is "A change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus," where a stimulus is any conceivable stimulus a living organism might face, for example, "a stimulus from a yeast species," "a stimulus by molecules of oomycetes origin," "a misfolded protein stimulus," "an inactivity stimulus," "a stimulus indicating the organism is under stress," or "a nutrient stimulus."

This definition is incredibly broad, with no temporal limitation, and with the allowance of basically any measurable change following a stimulus as being part of the response to the stimulus. The definition provides no requirement for proof of cause and effect and no requirement for distinguishing direct responses from indirect responses.

Some annotators may thus interpret these "response to X" definitions to annotate any gene that changes in its expression level in the presence of X to "response to X." This is despite the fact that many stimuli produce global changes in mRNA or protein levels that are only indirectly connected to the stimulus through cascading cause and effect relationships. A given stimulus in a microarray experiment may affect the levels of hundreds or even thousands of genes -- should these all be annotated to "response to X" even though the sheer numbers imply lack of specificity and yield little if any information about mechanism? A separate experiment in a different cell type might yield another hundred or thousand of genes not seen in the first cell type. Pretty soon half the genome is annotated to "response to X" and what have we really learned? The ultimate result of annotating changes in expression alone to "response to" terms is a mass of annotations that provide little of value to the end user, and destroy the utility of the annotated GO terms for finding the gene products that genuinely are involved in interpreting a stimulus.

In contrast, a more limited type of "response to" definition is illustrated by the term "response to dietary excess," defined as "The physiological process by which dietary excess is sensed by the central nervous system and results in a reduction in food intake and increased energy expenditure." This definition is much more concerned with the mechanism of response, not the products of response.

This second type of definition, one concerned with the mechanism of a response, is probably a more useful approach for the GO. As things stand now, I have a reluctance to use "response to" terms for expression data, as I prefer to reserve "response to" terms for the gene products involved in the actual detection, signal transduction, gene transcription, or (occasionally) the effector mechanisms that take place in response to a particular stimulus, and not simply for the mRNA or gene product that a cell produces following a stimulus. This limits my "response to" annotations to those gene products that actively interpret a stimulus and provide a clearer interpretation of what "response to" annotations mean. However I don't know how others are using these terms.

For those who prefer the broad use of "response to " terms for annotating expression data, I can understand how in this new age where metrics are paramount that the opportunity of adding hundreds or thousands of annotations based on a single expression experiment is highly desirable. But we ought to clarify their usage in any case.


I recommend the following:

1) We need to do some kind of survey of how annotators are currently using these terms, and how different MODs want to use these terms.

2) Depending on the results of the survey, the GO consortium should decide as a whole whether we want such a broad definition for these terms, or whether a revised, more mechanistic definition is appropriate.

3) Whether or not we change the definitions, we should also do an analysis of what types of data are appropriate to annotate to the "response to X" terms, to ensure consistency in annotation practice.




Notes from working group, fall 2008

  • Group Members: Tanya Berardini, Emily Dimmer, Pascale Gaudet, Ruth Lovering, Alexander Diehl

1. Historical Issue

Ruth: The 'response' terms have been in GO since the very beginning. Examples:

   GO:0006950	OS	stress response
   GO:0006951	OS	heat shock response

The definitions of the 'response to x' terms are quite broad and have been this way since they were first introduced. (as far as I can tell from going back in time in the GO.defs files back to Sept. 2001)

"A change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of ..."

The broadness of the definition leads to questions like these:

  • When do we annotate to these terms?
    • When a gene product is up-regulated following a stimulation should it be annotated to 'response to this stimulus'?

For example, should cell cycle genes up-regulated by insulin be included in 'response to insulin stimulus'?

Possibly the definition is so broad that all up-regulated genes could be included. But is this the intention of GO?

    • How far down a series of events do we annotate a protein to the process?

For example GO:0032868 response to insulin stimulus Definition: A change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of an insulin stimulus.

If we look at wikipedia's comments on the effects of insulin: http://en.wikipedia.org/wiki/Insulin then we would expect that insulin stimulation will cause changes to a wide variety of metabolic pathways and transport systems and increases in cell division associated pathways (replication, protein synthesis, cell cycle etc) etc.

I would have thought that the proteins associated with the 'response to insulin stimulus' would be: receptors detecting the stimulus, transducers ensuring that the response to insulin is initiated by the cell and that proteins involved in the change of the cell itself would not be included, ie not including proteins involved in the storage of glucose in liver (and muscle) cells in the form of glycogen.

Should insulin be associated with GO:0032868 response to insulin stimulus?

    • Is there inconsistency in the annotation of proteins to these terms? If there is, is this related to whether the annotation is to multicellular organism or singled cell organism?
  • Are there certain evidence codes that should/should not be used when annotating to these terms?
    • Review of existing evidence codes used


From GO database:

   +-----------------+
   | code | count(*) |
   +------+----------+
   | IEA  |    82721 |
   | IEP  |     2849 |
   | IMP  |     2246 |
   | ISS  |     2492 |
   | IDA  |     1233 |
   | NAS  |      207 |
   | RCA  |      443 |
   | TAS  |      469 |
   | IGI  |      394 |
   | IC   |      126 |
   | IPI  |       21 |
   | ISO  |       13 |
   | NR   |        8 |
   +------+----------+

TAIR case: mostly IEP (majority are from Northern or RT-PCR experiments, fewer microarray), then about equal amounts of IMP (treat the mutant with a substance, don't get a response where you get one in the wild type or the reverse), ISS (TIGR inherited), IEA (INTERPRO2GO), fewer annotations with IGI, TAS, IDA, NAS

Ruth: I am starting to change my mind a bit on this one. I wondered if perhaps what is needed is more child terms to the response_to terms so that it is clearer whether the gene is involved in the signal transduction from receptor to transcription or in the effector side, covering morphological changes, apoptosis, cytokine synthesis etc. With appropriate regulation terms included. At present the use of 'response_to' terms are being lost because such a large number of genes are being annotated to them. If we have slightly more specific terms (but still quite general otherwise we will just be duplicating all the signal transduction terms etc) then the association of (for example) a gene product involved in sensing a specific stress will not get lost amongst apoptotic genes.

I realise that GO terms often only get created when the term is needed, but I think it would be useful to try to standardise the available options for these terms. So how about all response to terms have the following set up where appropriate):

  • response to x
  • > detection of x
  • > cellular response to x
  • >> x mediated signaling pathway
  • >> regulation of transcription in response to (or by) x
  • >> regulation of protein synthesis/post-transcriptional modifications in response to x
  • >> change in cell state in response to x (eg morphology, apoptosis, activation, rigidity, motility, cell growth)
  • > regulation of response to x
  • > negative regulation of response to x
  • > positive regulation of response to x

I guess the problem is that this would create a massive increase in terms if we took this to all possible regulation terms etc. Or even just adding these children to all the major response to terms (The majority of these terms already exist for at least one of the current response to terms and some are just listed under the response to extracellular stimulus term.)

The advantage would be that it would encourage curators to consider what part of the response process their protein is involved in.

Emily: I agree that the standard definition of 'response to' terms is too broad, and enables a wide range of gene products to be annotated.

- producing annotations such as: IDA annotations where curators have read evidence of their protein seen to change subcellular location (movement) as a result of a specific stimulus, IDA annotations where investigators have measured the increase of a particular protein's catalytic activity in response to some stimulus.

- perhaps such annotations should only be created if there is additional information on the roles the proteins are involved in with regards to such a response, so in a similar vein with Ruth, I'd be tempted to encourage curators to annotate more often to more descriptive terms underneath a 'response to x' term, and for this term to only be used by external users for slimming purposes. Requiring annotation to such granular terms would, I imagine, reduce the frequency with which IEP annotations could made to such terms en masse..?

Alex: (I am transferring and expanding upon my comments from the SF entry to here. Also please read my earlier wiki post on this topic, Use_of_Response_To_Terms_in_Annotation of April 2008) I have long argued for limits on the use of "response to" terms. I prefer to reserve "response to" terms for the gene products involved in the actual detection, signal transduction, gene transcription, or (occasionally) the effector mechanisms that take place in response to a particular stimulus, and not simply for any mRNA or gene product that varies in its abundance following a stimulus. This limits my "response to" annotations to those gene products that actively interpret a stimulus and provide a clearer interpretation of what "response to" annotations mean. In speaking to other curators at MGI, this is largely the approach we have used regarding the use of these terms and will continue to use.

The alternative approach of annotating any gene product that varies in its expression level following a stimulus (as measured, for instance, by microarray) would allow huge numbers of genes to be annotated in many high-throughput experiments without providing any clear understanding of what those genes contribute in the process of responding, or where they function in the sequence of events following the receipt of a stimulus. Many stimuli ultimately result in the activation of the same set of downstream genes involved, for instance, in cell proliferation, and annotating those downstream genes provides no real information. The ultimate result of annotating changes in expression or protein abundance alone to "response to" terms is a mass of annotations that provide little of value to the end user, and destroy the utility of using the annotated GO terms for finding the gene products that genuinely are involved in interpreting a stimulus, rather than those that are generically involved in processes such as cell proliferation or division.

Looking at Ruth's scheme of terms above, it is clear that the terms under "cellular response to X" are ideally part_of to the parent term, not is_a. We should not introduce more multiple is_a parentage here. I am not opposed to the creation of these terms where and when needed, but I do oppose the wholesale creation of terms in this area without demonstrated need for annotation. The universe of X is the whole universe, and I see no need to burden the GO editors with creating these terms before there is a demonstrated need for annotation. I also wonder how many of the regulation to response to X terms will truely be needed. There are times when a particular gene product can modulate the action of a particular signaling pathway, but again we should create these terms only as needed.

Also "defense response to X" is a type of "response to X" that should be used in all cases where a defensive effector mechanism is triggered by the stimulus.

Ruth: I can see that the additional work for GO editors to create lots of child terms below the response_to parents may be unnecessary. However, I am concerned that some curators have other focuses to their work and for them requesting GO terms is not a high priority. Consequently, if there was even just a small set of child terms which provide at least a statement about the start and finish to the process term then there is a good chance these terms would be used eg:

  1. x mediated signaling pathway
  2. >> regulation of transcription in response to (or by) x
  3. >> regulation of protein synthesis/post-transcriptional modifications in response to x
  4. >> change in cell state in response to x (eg morphology, apoptosis, activation, rigidity, motility, cell growth)

I disagree with Alex that the response to terms should be limited to the signal transduction aspect of a response. I think that the proteins involved in mediating the actual changes to the cell phenotype are relevant, although I appreciate that there will be a doubling up of many annotations, with genes involved in apoptosis, secretion of cytokines, cell motility etc will be annotated to these GO terms as well as to 'change in cell state in response to x'. But perhaps there are some genes that are involved in apoptosis (for example) which are not involved in the apoptosis triggered by some stimuli? If this is the case then the above structure would enable the different pathways to be identified. (see SF item)

Alex: If you read my earlier comment above, you will see that I do not limit my annotations to simply the signal transduction aspects of a response, but include "gene transcription, or (occasionally) the effector mechanisms that take place in response to a particular stimulus." This certainly could include genes involved in the induction of apoptosis in response to a particular stimuli, if those genes are part of a specific pathway rather than those that are involved in general mechanisms of apoptosis. A key phrase is "..gene products that actively interpret a stimulus." Obviously, this requires a bit of curator judgment.

2. Proposed solution(s)

Review the intention of the 'response to' GO terms. Consider identifying start and finish of the 'response to' process. More broadly, refine the scope of these terms to be more clear.

Come up with guidelines of what experimental assays would provide supporting evidence for annotations using these terms.

Should this go to the annotation group? Should this go on the Montreal agenda?

3. Comments/counter arguments

SourceForge issue opened: 'response to stimulus' http://sourceforge.net/tracker/index.php?func=detail&aid=2094943&group_id=36855&atid=440764 (Alex: I have now added comment to this entry.)

Other related SF items:

'response to stimulus' http://sourceforge.net/tracker/index.php?func=detail&aid=1601609&group_id=36855&atid=440764

'protein stimulus' http://sourceforge.net/tracker/index.php?func=detail&aid=1601557&group_id=36855&atid=440764

'regulation of response to tumor necrosis factor' http://sourceforge.net/tracker/index.php?func=detail&aid=2129906&group_id=36855&atid=440764