Annotation consistency: 'Response to' terms

From GO Wiki
Jump to: navigation, search

Group Members

(anyone can add themselves)

  • Tanya Berardini, Emily Dimmer, Pascale Gaudet, Ruth Lovering, Alexander Diehl

1. Issue

Ruth: The 'response' terms have been in GO since the very beginning. Examples:

   GO:0006950	OS	stress response
   GO:0006951	OS	heat shock response

The definitions of the 'response to x' terms are quite broad and have been this way since they were first introduced. (as far as I can tell from going back in time in the GO.defs files back to Sept. 2001)

"A change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of ..."

The broadness of the definition leads to questions like these:

  • When do we annotate to these terms?
    • When a gene product is up-regulated following a stimulation should it be annotated to 'response to this stimulus'?

For example, should cell cycle genes up-regulated by insulin be included in 'response to insulin stimulus'?

Possibly the definition is so broad that all up-regulated genes could be included. But is this the intention of GO?

    • How far down a series of events do we annotate a protein to the process?

For example GO:0032868 response to insulin stimulus Definition: A change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of an insulin stimulus.

If we look at wikipedia's comments on the effects of insulin: then we would expect that insulin stimulation will cause changes to a wide variety of metabolic pathways and transport systems and increases in cell division associated pathways (replication, protein synthesis, cell cycle etc) etc.

I would have thought that the proteins associated with the 'response to insulin stimulus' would be: receptors detecting the stimulus, transducers ensuring that the response to insulin is initiated by the cell and that proteins involved in the change of the cell itself would not be included, ie not including proteins involved in the storage of glucose in liver (and muscle) cells in the form of glycogen.

Should insulin be associated with GO:0032868 response to insulin stimulus?

    • Is there inconsistency in the annotation of proteins to these terms? If there is, is this related to whether the annotation is to multicellular organism or singled cell organism?
  • Are there certain evidence codes that should/should not be used when annotating to these terms?
    • Review of existing evidence codes used

From GO database:

   | code | count(*) |
   | IEA  |    82721 |
   | IEP  |     2849 |
   | IMP  |     2246 |
   | ISS  |     2492 |
   | IDA  |     1233 |
   | NAS  |      207 |
   | RCA  |      443 |
   | TAS  |      469 |
   | IGI  |      394 |
   | IC   |      126 |
   | IPI  |       21 |
   | ISO  |       13 |
   | NR   |        8 |

TAIR case: mostly IEP (majority are from Northern or RT-PCR experiments, fewer microarray), then about equal amounts of IMP (treat the mutant with a substance, don't get a response where you get one in the wild type or the reverse), ISS (TIGR inherited), IEA (INTERPRO2GO), fewer annotations with IGI, TAS, IDA, NAS

Ruth: I am starting to change my mind a bit on this one. I wondered if perhaps what is needed is more child terms to the response_to terms so that it is clearer whether the gene is involved in the signal transduction from receptor to transcription or in the effector side, covering morphological changes, apoptosis, cytokine synthesis etc. With appropriate regulation terms included. At present the use of 'response_to' terms are being lost because such a large number of genes are being annotated to them. If we have slightly more specific terms (but still quite general otherwise we will just be duplicating all the signal transduction terms etc) then the association of (for example) a gene product involved in sensing a specific stress will not get lost amongst apoptotic genes.

I realise that GO terms often only get created when the term is needed, but I think it would be useful to try to standardise the available options for these terms. So how about all response to terms have the following set up where appropriate):

  • response to x
  • > detection of x
  • > cellular response to x
  • >> x mediated signaling pathway
  • >> regulation of transcription in response to (or by) x
  • >> regulation of protein synthesis/post-transcriptional modifications in response to x
  • >> change in cell state in response to x (eg morphology, apoptosis, activation, rigidity, motility, cell growth)
  • > regulation of response to x
  • > negative regulation of response to x
  • > positive regulation of response to x

I guess the problem is that this would create a massive increase in terms if we took this to all possible regulation terms etc. Or even just adding these children to all the major response to terms (The majority of these terms already exist for at least one of the current response to terms and some are just listed under the response to extracellular stimulus term.)

The advantage would be that it would encourage curators to consider what part of the response process their protein is involved in.

Emily: I agree that the standard definition of 'response to' terms is too broad, and enables a wide range of gene products to be annotated.

- producing annotations such as: IDA annotations where curators have read evidence of their protein seen to change subcellular location (movement) as a result of a specific stimulus, IDA annotations where investigators have measured the increase of a particular protein's catalytic activity in response to some stimulus.

- perhaps such annotations should only be created if there is additional information on the roles the proteins are involved in with regards to such a response, so in a similar vein with Ruth, I'd be tempted to encourage curators to annotate more often to more descriptive terms underneath a 'response to x' term, and for this term to only be used by external users for slimming purposes. Requiring annotation to such granular terms would, I imagine, reduce the frequency with which IEP annotations could made to such terms en masse..?

Alex: (I am transferring and expanding upon my comments from the SF entry to here. Also please read my earlier wiki post on this topic, Use_of_Response_To_Terms_in_Annotation of April 2008) I have long argued for limits on the use of "response to" terms. I prefer to reserve "response to" terms for the gene products involved in the actual detection, signal transduction, gene transcription, or (occasionally) the effector mechanisms that take place in response to a particular stimulus, and not simply for any mRNA or gene product that varies in its abundance following a stimulus. This limits my "response to" annotations to those gene products that actively interpret a stimulus and provide a clearer interpretation of what "response to" annotations mean. In speaking to other curators at MGI, this is largely the approach we have used regarding the use of these terms and will continue to use.

The alternative approach of annotating any gene product that varies in its expression level following a stimulus (as measured, for instance, by microarray) would allow huge numbers of genes to be annotated in many high-throughput experiments without providing any clear understanding of what those genes contribute in the process of responding, or where they function in the sequence of events following the receipt of a stimulus. Many stimuli ultimately result in the activation of the same set of downstream genes involved, for instance, in cell proliferation, and annotating those downstream genes provides no real information. The ultimate result of annotating changes in expression or protein abundance alone to "response to" terms is a mass of annotations that provide little of value to the end user, and destroy the utility of using the annotated GO terms for finding the gene products that genuinely are involved in interpreting a stimulus, rather than those that are generically involved in processes such as cell proliferation or division.

Looking at Ruth's scheme of terms above, it is clear that the terms under "cellular response to X" are ideally part_of to the parent term, not is_a. We should not introduce more multiple is_a parentage here. I am not opposed to the creation of these terms where and when needed, but I do oppose the wholesale creation of terms in this area without demonstrated need for annotation. The universe of X is the whole universe, and I see no need to burden the GO editors with creating these terms before there is a demonstrated need for annotation. I also wonder how many of the regulation to response to X terms will truely be needed. There are times when a particular gene product can modulate the action of a particular signaling pathway, but again we should create these terms only as needed.

Also "defense response to X" is a type of "response to X" that should be used in all cases where a defensive effector mechanism is triggered by the stimulus.

Ruth: I can see that the additional work for GO editors to create lots of child terms below the response_to parents may be unnecessary. However, I am concerned that some curators have other focuses to their work and for them requesting GO terms is not a high priority. Consequently, if there was even just a small set of child terms which provide at least a statement about the start and finish to the process term then there is a good chance these terms would be used eg:

  1. x mediated signaling pathway
  2. >> regulation of transcription in response to (or by) x
  3. >> regulation of protein synthesis/post-transcriptional modifications in response to x
  4. >> change in cell state in response to x (eg morphology, apoptosis, activation, rigidity, motility, cell growth)

I disagree with Alex that the response to terms should be limited to the signal transduction aspect of a response. I think that the proteins involved in mediating the actual changes to the cell phenotype are relevant, although I appreciate that there will be a doubling up of many annotations, with genes involved in apoptosis, secretion of cytokines, cell motility etc will be annotated to these GO terms as well as to 'change in cell state in response to x'. But perhaps there are some genes that are involved in apoptosis (for example) which are not involved in the apoptosis triggered by some stimuli? If this is the case then the above structure would enable the different pathways to be identified. (see SF item)

Alex: If you read my earlier comment above, you will see that I do not limit my annotations to simply the signal transduction aspects of a response, but include "gene transcription, or (occasionally) the effector mechanisms that take place in response to a particular stimulus." This certainly could include genes involved in the induction of apoptosis in response to a particular stimuli, if those genes are part of a specific pathway rather than those that are involved in general mechanisms of apoptosis. A key phrase is "..gene products that actively interpret a stimulus." Obviously, this requires a bit of curator judgment.

2. Proposed solution(s)

Review the intention of the 'response to' GO terms. Consider identifying start and finish of the 'response to' process. More broadly, refine the scope of these terms to be more clear.

Come up with guidelines of what experimental assays would provide supporting evidence for annotations using these terms.

Should this go to the annotation group? Should this go on the Montreal agenda?

3. Comments/counter arguments

SourceForge issue opened: 'response to stimulus' (Alex: I have now added comment to this entry.)

Other related SF items:

'response to stimulus'

'protein stimulus'

'regulation of response to tumor necrosis factor'

4. Proposed resolution

Back to Reference_Genome_Annotation_Project