Difference between revisions of "2010 GO camp downstream effect"

From GO Wiki
Jump to: navigation, search
(Annotating to downstream processes)
(Annotating to downstream processes)
Line 95: Line 95:
| Rama
| No
| No
| In the absence of other information, probably
|* '''When I do annotate''': We annotate to downstream processes if that is the only information available. If a direct role is published later, we remove the downstream effect. We have a robust phenotype system to capture that data so that it is not lost.

Revision as of 16:13, 28 April 2010

1. Background

2. Review of current GO annotation practices

  • Annotating signaling biological processes to transcription factors
  • when not to capture phenotypes : from 22nd Feb Jamboree call [1], Tanya: It's not uncommon for the initial publications to describe a mutant phenotype, with a developmental defect, and then later publications to describe much more explicit functions or processes. You should always annotate based on whatever evidence is available. Once you've done that, the question becomes, "When do we keep or remove the phenotype-based annotations?" At TAIR, their policy is to keep the developmental terms if they think that their users would expect to see them. Some participants suggested that one would expect all orthologs to have the same development-type annotations, across organisms. Others disagreed with this expectation.
  • [From Karen] This is fairly anecdotal. I wasn't able to find papers about this, but it's been known for a long time so that isn't necessarily surprising.

In S. cerevisiae, there are a number of genes which are components of the spliceosome, which when mutated produce strains with defects in protein production/accumulation. Early on, some of these genes were thought to be involved in translation. It was later determined that these genes are components of the spliceosome which are involved in mRNA splicing and not directly in translation at all. The reason why splicing defects produce translation defects is related to the distribution of introns in cerevisiae. Out of about 6000 genes, only about 270 contain introns. Many of the intron containing genes are ribosomal protein genes. Combined with the fact that ribosomal protein genes are highly transcribed, splicing defects have a disproportionate effect on production of ribosomal proteins and thus on translation.

So, while it is true that mutations in many spliceosomal genes produce a phenotype of defects in protein production, it is very clear that this is a downstream effect related to the fact that the majority of mRNAs to be spliced are ribosomal protein genes. Thus, we do not use the mutant phenotype of a defect in protein production to annotate these genes to GO terms related to translation.

  • Ranjana from WormBase: Checking for embryonic lethality or larval stages that do not develop further are very common assays that authors do in the elegans field, such that it may feel like our genes are over-annotated to the terms "embryonic development ending in birth or egg-hatching" and/or "nematode larval development". Also, like Tanya pointed out, we too annotate to the paper, if you want to say we have annotated every paper that talks about a gene, then you record everything. Sometimes its hard to tell whether something is a downstream effect. We have no mechanism in place to go back and remove these high-level development terms once the core process/function is known.

Annotating to downstream processes

Could at least one person from each group fill in this survey to give an idea of how much discrepancy there is between groups/annotators. This will give us an idea of the issues that are controversial and need to be discussed. In the first conference call (see discussions page for minutes), these five areas were identified as being used for downstream process annotations;

  • 1. development
  • 2. ageing
  • 3. signalling - when does a process start/end
  • 4. IMPs from large-scale mutant screens and from individual experiments
  • 5. gene products with few papers available (e.g. a phenotype is all you have to annotate with)

It would be useful to know which groups annotate to any of these areas and if you DON'T annotate to a certain area, why not? If your examples are too long for the table, please insert them into the main body of section 2. Review of current GO annotation practices.

Name Group Do you always annotate to downstream processes? Do you never annotate to downstream processes? Do you sometimes annotate to downstream processes?* (If yes, please answer next question) *Give examples of when you would AND wouldn't annotate to a DS process Do you go back and remove annotations if you find out more specific information about a gene products?
Rachael GOA No No Yes General: If it adds information that maybe useful to users I would usually annotate DS, but not at the cost of diluting what the gp is centrally involved in. I would not annotate to examples such as Karen's above which clearly demonstrates that effect on translation is a side effect of the mutation of the spliceosome. 1.Development - I think this is useful when users want to know all genes involved in e.g. eye development. I would annotate if a gp was repeatedly shown to have an effect on 'x' development (and, if the evidence was convincing, a single paper reporting an effect on development as this may be previously unknown information), but I maybe wouldn't annotate if there were pleiotropic effects. 3. Signalling - If a gp was repeatedly shown to have an effect on a particular pathway I would annotate. 4. I wouldn't annotate large-scale screens, but I do annotate individual experiments. 5. I would annotate if few papers available and the evidence was good. If they are old NAS/TAS evidence annotations I'll try to improve the evidence/granularity of the term or if there are high-level terms without much meaning I will probably delete them. I wouldn't look at a paper that had already been curated with experimental evidence codes. If a previous annotation is noticeably different to what I have read about the gp, I will go and check the original paper.
Fiona AgBase No No Yes General: There are a lot of chicken papers where they do IMP on very early embyronic stages and see many phenotypic effects. If the gene is well studied we can sometimes tease apart what processes we should annotate to but oftentimes this may be the only gene product information we have. No
Rebecca GOA No No Yes General: Alot of frog annotation is development based, and a role in a particular part of development is annotated based on mutant phenotype- it's not always possible to see what is direct and what is downstream, so I would record what the paper shows with IMP. Even if it is a downstream effect, if it isn't an artefact of the assay, I think it's still valid to annotate it. Transcription factors for example can regulate a number of different genes so have a whole load of downstream effects, which I would curate if the paper showed. I probably wouldn't curate downstream effects if the authors explicitly say they are knock-on effects. I tend to be more generous than cautious in annotating downstream events. Not often. Generally I curate on a protein-by protein basis so unless I'm adding in a new paper, I don't revisit the same protein often. I would replace IC, TAS or NAS with an experimental evidence tag if possible, and relook at annotations if there are glaring inconsistencies, but otherwise no.
Ruth & Varsha BHF-UCL No No Yes General: We would aim to capture what the gp is 'centrally' involved in, but also to recognise what the overall gp 'function' is. ie a digestive enzyme plays a role in digestion not just substrate catabolism. Karen's example we would aim to annotate to spliceosome rather than translation and rely on author statements to clarify the function/process of the gp.
  1. Development - we are aware of possible problems with over annotating to development terms, we aim to annotate gps involved in the process of actually 'determining' development rather than gps which could be considered as (eg) structural or essential constituents of a differentiated cell type. However we are concerned by the ontology structure in the development domain, where the process of differentiation is a parent to development. This consequently gps involved in development are being 'grouped' with gps involved in differentiation.
  2. Ageing - we don't have much experience of this.
  3. signaling - We find that one of the main problems with annotating to signaling is to capture the impact of a gp function/intent on the cell/organism. For example a growth factor BMP2 is instrumental in cardiac cell differentiation, consequently a microarray analysis would show large numbers of genes up and down regulated following stimulation with BMP2. Ideally we would like to see GO terms such as 'regulation of transcription involved in cardiac cell differentiation' created (with parents regulation of transcription and cardiac cell differentiation). This would identify BMP2's 'intent' to change the 'state' of the cell. Please see note below on 'transcription' problems.
  4. IMPs from large-scale mutant screens and from individual experiments - We don't have the opportunity to annotate large scale mutant screens, but we would like 'rules' to ensure that 'structural or essential constituents of a differentiated cell type' are not annotated to developmental terms.
  5. gene products with few papers available (e.g. a phenotype is all you have to annotate with) - We don't have this problem very often. We would aim to look whether there were any known functional domains within a protein which might substantiate any phenotype observations, before annotating. Would creation of MOD phenotype database prevent the 'need' to 'overannotate' in these cases?
  6. transcription - We have identified an inconsistency in the use of the transcription process terms that exists between different MODs. The definitions themselves are to blame and deciding how to improve these definitions to enable more consistant use of these terms should be a major priority.
  • GO:0006366 transcription from RNA polymerase II promoter: The synthesis of RNA from a DNA template by RNA polymerase II (Pol II), originating at a Pol II-specific promoter. Includes transcription of messenger RNA (mRNA) and certain small nuclear RNAs (snRNAs), this sounds like a function term for Pol II.
  • The child term: GO:0032569 gene-specific transcription from RNA polymerase II promoter: The specifically regulated synthesis of RNA from DNA encoding a specific gene or set of genes by RNA polymerase II (Pol II), originating at a Pol II-specific promoter. In addition to RNA polymerase II and the general transcription factors, specific transcription requires one or more specific factors that bind to specific DNA sequences or interact with the general transcription machinery. this seems to cover more transcription factors than parent however this has pretty much only been used in annotation by SGD and Pombi, which may just reflect lack of TF annotation.
  • It looks like other MODs prefer to use regulation terms such as GO:0045449 regulation of transcription (Any process that modulates the frequency, rate or extent of the synthesis of either RNA on a template of DNA or DNA on a template of RNA) or GO:0010551 regulation of gene-specific transcription from RNA polymerase II promoter. In many cases hormones/growth factors and receptors are annotated to the very specific term GO:0010551.
  • Ideally the terms should be defined so that we can capture that hormones regulate transcription but enable us also to create discrete 'transcription factor/cofactor' gene groups only populated by genes which are at the 'coal face' of transcription either directly involved in transcription or considered as 'direct' regulators of transcription.
only on a protein-by-protein basis, we would replace IC, TAS or NAS with an experimental evidence tag if possible, and relook at annotations if there are glaring inconsistencies.
Pascale dictyBase No No In the absence of other information, probably * When I do annotate: I annotate to downstream processes especially when there is no other information. I annotate to terms like 'cell proliferation' and some development terms, usually by IMP, and in some cases IEP.
  • When I dont annotate: I am not comfortable annotating to terms such as pathways (signaling or metabolic) when the evidence is indirect. For example: ACA is part of the GPCR signaling pathway leading to chemotaxis is response to cAMP. If a protein or a gene is identified that causes levels of ACA to be changed, and that causes changes in the chemotaxtic response, I am okay with annotating to 'chemotaxis is response to cAMP'. However I will probably not capture "GPCR signaling pathway' or 'regulation of GPCR signaling pathway'.
We have mixed feelings about removing annotations, due to concerns about not capturing the entire literature or appearingnot being neutral about which authors to cite.
Tim RGD No No Yes General: We annotate to gene function/process, as assessed by the assay used in the experiment. We do not annotate "expression only" papers. We try to find the most specific GO term to match the assay. For example, a paper looked at the adenosine receptor subtypes that were known to be up-regulated by spinal cord injury. They showed that if they blocked one particular receptor subtype, they blocked the increased response to (thermal) pain. So the receptor (gp) was annotated to "detection of temperature stimulus involved in sensory perception of pain". In Karen's example, we would likely have annotations to both protein production terms and splicing terms. We have recently developed the ability to edit our annotations. This might be used to "correct" an obviously wrong annotation, but that was not the intent when developing the tool. I am sure there are "conflicting" annotations in RGD, where one is likely wrong, but that simply reflects the literature. Notes can be added to clarify conflicts.
Rama SGD No No In the absence of other information, probably * When I do annotate: We annotate to downstream processes if that is the only information available. If a direct role is published later, we remove the downstream effect. We have a robust phenotype system to capture that data so that it is not lost.

3. Proposed annotation policy

4. Examples (papers) and discussion of GO annotation issues

  • Submitted by Pascale: There are several SF items about growth/cell growth/cell proliferation. I know some of the terms were done to accommodate experiments done in Dicty - often people look at the rate of cell proliferation as a general phenotype, and we have been capturing this. It's very high level and usually IMP, but in the absence of other information it seems relevant (otherwise people would not bother testing it).

What do people think about this? To me it's similar to the issue of annotating from IEP or to high level developmental terms by IMP. The question is, what data are too general to be useful to capture?

  • Organismal behaviors are always quite controversial. For example, lonp gene of rat is annotated to aging.

5. Suggestions for Quality Control procedures

Back to 2010_GO_camp_Meeting_Agenda