Difference between revisions of "2010 GO camp downstream effect"

From GO Wiki
Jump to navigation Jump to search
Line 136: Line 136:
|Li Ni
|Li Ni
|MGI always annotate papers based on whatever evidence/granularity of the term is available. In terms of development, lots of time we annotate what the paper shows based on mutant phenotype. If it later becomes clear that the mutated gene is involved in some specific process, we add in new annotations. Right now, we generally do not go back remove or update annotations unless they were incorrectly curated in the first place.

Revision as of 10:46, 4 May 2010

1. Background

2. Review of current GO annotation practices

  • Annotating signaling biological processes to transcription factors
  • when not to capture phenotypes : from 22nd Feb Jamboree call [1], Tanya: It's not uncommon for the initial publications to describe a mutant phenotype, with a developmental defect, and then later publications to describe much more explicit functions or processes. You should always annotate based on whatever evidence is available. Once you've done that, the question becomes, "When do we keep or remove the phenotype-based annotations?" At TAIR, their policy is to keep the developmental terms if they think that their users would expect to see them. Some participants suggested that one would expect all orthologs to have the same development-type annotations, across organisms. Others disagreed with this expectation.
  • [From Karen] This is fairly anecdotal. I wasn't able to find papers about this, but it's been known for a long time so that isn't necessarily surprising.

In S. cerevisiae, there are a number of genes which are components of the spliceosome, which when mutated produce strains with defects in protein production/accumulation. Early on, some of these genes were thought to be involved in translation. It was later determined that these genes are components of the spliceosome which are involved in mRNA splicing and not directly in translation at all. The reason why splicing defects produce translation defects is related to the distribution of introns in cerevisiae. Out of about 6000 genes, only about 270 contain introns. Many of the intron containing genes are ribosomal protein genes. Combined with the fact that ribosomal protein genes are highly transcribed, splicing defects have a disproportionate effect on production of ribosomal proteins and thus on translation.

So, while it is true that mutations in many spliceosomal genes produce a phenotype of defects in protein production, it is very clear that this is a downstream effect related to the fact that the majority of mRNAs to be spliced are ribosomal protein genes. Thus, we do not use the mutant phenotype of a defect in protein production to annotate these genes to GO terms related to translation.

  • Ranjana from WormBase: Checking for embryonic lethality or larval stages that do not develop further are very common assays that authors do in the elegans field, such that it may feel like our genes are over-annotated to the terms "embryonic development ending in birth or egg-hatching" and/or "nematode larval development". Also, like Tanya pointed out, we too annotate to the paper, if you want to say we have annotated every paper that talks about a gene, then you record everything. Sometimes its hard to tell whether something is a downstream effect. We have no mechanism in place to go back and remove these high-level development terms once the core process/function is known.

Annotating to downstream processes

Could at least one person from each group fill in this survey to give an idea of how much discrepancy there is between groups/annotators. This will give us an idea of the issues that are controversial and need to be discussed. In the first conference call (see discussions page for minutes), these five areas were identified as being used for downstream process annotations;

  • 1. development
  • 2. ageing
  • 3. signalling - when does a process start/end
  • 4. IMPs from large-scale mutant screens and from individual experiments
  • 5. gene products with few papers available (e.g. a phenotype is all you have to annotate with)

It would be useful to know which groups annotate to any of these areas and if you DON'T annotate to a certain area, why not? If your examples are too long for the table, please insert them into the main body of section 2. Review of current GO annotation practices.

Name Group Do you always annotate to downstream processes? Do you never annotate to downstream processes? Do you sometimes annotate to downstream processes?* (If yes, please answer next question) *Give examples of when you would AND wouldn't annotate to a DS process Do you go back and remove annotations if you find out more specific information about a gene products?
Rachael GOA No No Yes General: If it adds information that maybe useful to users I would usually annotate DS, but not at the cost of diluting what the gp is centrally involved in. I would not annotate to examples such as Karen's above which clearly demonstrates that effect on translation is a side effect of the mutation of the spliceosome. 1.Development - I think this is useful when users want to know all genes involved in e.g. eye development. I would annotate if a gp was repeatedly shown to have an effect on 'x' development (and, if the evidence was convincing, a single paper reporting an effect on development as this may be previously unknown information), but I maybe wouldn't annotate if there were pleiotropic effects. 3. Signalling - If a gp was repeatedly shown to have an effect on a particular pathway I would annotate. 4. I wouldn't annotate large-scale screens, but I do annotate individual experiments. 5. I would annotate if few papers available and the evidence was good. If they are old NAS/TAS evidence annotations I'll try to improve the evidence/granularity of the term or if there are high-level terms without much meaning I will probably delete them. I wouldn't look at a paper that had already been curated with experimental evidence codes. If a previous annotation is noticeably different to what I have read about the gp, I will go and check the original paper.
Fiona AgBase No No Yes General: There are a lot of chicken papers where they do IMP on very early embyronic stages and see many phenotypic effects. If the gene is well studied we can sometimes tease apart what processes we should annotate to but oftentimes this may be the only gene product information we have. No
Rebecca GOA No No Yes General: Alot of frog annotation is development based, and a role in a particular part of development is annotated based on mutant phenotype- it's not always possible to see what is direct and what is downstream, so I would record what the paper shows with IMP. Even if it is a downstream effect, if it isn't an artefact of the assay, I think it's still valid to annotate it. Transcription factors for example can regulate a number of different genes so have a whole load of downstream effects, which I would curate if the paper showed. I probably wouldn't curate downstream effects if the authors explicitly say they are knock-on effects. I tend to be more generous than cautious in annotating downstream events. Not often. Generally I curate on a protein-by protein basis so unless I'm adding in a new paper, I don't revisit the same protein often. I would replace IC, TAS or NAS with an experimental evidence tag if possible, and relook at annotations if there are glaring inconsistencies, but otherwise no.
Ruth & Varsha BHF-UCL No No Yes General: We would aim to capture what the gp is 'centrally' involved in, but also to recognise what the overall gp 'function' is. ie a digestive enzyme plays a role in digestion not just substrate catabolism. Karen's example we would aim to annotate to spliceosome rather than translation and rely on author statements to clarify the function/process of the gp.
  1. Development - we are aware of possible problems with over annotating to development terms, we aim to annotate gps involved in the process of actually 'determining' development rather than gps which could be considered as (eg) structural or essential constituents of a differentiated cell type.
  2. Ageing - we don't have much experience of this.
  3. signaling - We find that one of the main problems with annotating to signaling is to capture the impact of a gp function/intent on the cell/organism. For example a growth factor BMP2 is instrumental in cardiac cell differentiation, consequently a microarray analysis would show large numbers of genes up and down regulated following stimulation with BMP2. Ideally we would like to see GO terms such as 'regulation of transcription involved in cardiac cell differentiation' created (with parents regulation of transcription and cardiac cell differentiation). This would identify BMP2's 'intent' to change the 'state' of the cell. Please see note below on 'transcription' problems.
  4. IMPs from large-scale mutant screens and from individual experiments - We don't have the opportunity to annotate large scale mutant screens, but we would like 'rules' to ensure that 'structural or essential constituents of a differentiated cell type' are not annotated to developmental terms.
  5. gene products with few papers available (e.g. a phenotype is all you have to annotate with) - We don't have this problem very often. We would aim to look whether there were any known functional domains within a protein which might substantiate any phenotype observations, before annotating. Would creation of MOD phenotype database prevent the 'need' to 'overannotate' in these cases?
  6. transcription - We have identified an inconsistency in the use of the transcription process terms that exists between different MODs. The definitions themselves are to blame and deciding how to improve these definitions to enable more consistant use of these terms should be a major priority.
  • GO:0006366 transcription from RNA polymerase II promoter: The synthesis of RNA from a DNA template by RNA polymerase II (Pol II), originating at a Pol II-specific promoter. Includes transcription of messenger RNA (mRNA) and certain small nuclear RNAs (snRNAs), this sounds like a function term for Pol II.
  • The child term: GO:0032569 gene-specific transcription from RNA polymerase II promoter: The specifically regulated synthesis of RNA from DNA encoding a specific gene or set of genes by RNA polymerase II (Pol II), originating at a Pol II-specific promoter. In addition to RNA polymerase II and the general transcription factors, specific transcription requires one or more specific factors that bind to specific DNA sequences or interact with the general transcription machinery. this seems to cover more transcription factors than parent however this has pretty much only been used in annotation by SGD and Pombi, which may just reflect lack of TF annotation.
  • It looks like other MODs prefer to use regulation terms such as GO:0045449 regulation of transcription (Any process that modulates the frequency, rate or extent of the synthesis of either RNA on a template of DNA or DNA on a template of RNA) or GO:0010551 regulation of gene-specific transcription from RNA polymerase II promoter. In many cases hormones/growth factors and receptors are annotated to the very specific term GO:0010551.
  • Ideally the terms should be defined so that we can capture that hormones regulate transcription but enable us also to create discrete 'transcription factor/cofactor' gene groups only populated by genes which are at the 'coal face' of transcription either directly involved in transcription or considered as 'direct' regulators of transcription.
only on a protein-by-protein basis, we would replace IC, TAS or NAS with an experimental evidence tag if possible, and relook at annotations if there are glaring inconsistencies.
Pascale dictyBase No No In the absence of other information, probably * When I do annotate: I annotate to downstream processes especially when there is no other information. I annotate to terms like 'cell proliferation' and some development terms, usually by IMP, and in some cases IEP.
  • When I dont annotate: I am not comfortable annotating to terms such as pathways (signaling or metabolic) when the evidence is indirect. For example: ACA is part of the GPCR signaling pathway leading to chemotaxis is response to cAMP. If a protein or a gene is identified that causes levels of ACA to be changed, and that causes changes in the chemotaxtic response, I am okay with annotating to 'chemotaxis is response to cAMP'. However I will probably not capture "GPCR signaling pathway' or 'regulation of GPCR signaling pathway'.
We have mixed feelings about removing annotations, due to concerns about not capturing the entire literature or appearingnot being neutral about which authors to cite.
Tim RGD No No Yes General: We annotate to gene function/process, as assessed by the assay used in the experiment. We do not annotate "expression only" papers. We try to find the most specific GO term to match the assay. For example, a paper looked at the adenosine receptor subtypes that were known to be up-regulated by spinal cord injury. They showed that if they blocked one particular receptor subtype, they blocked the increased response to (thermal) pain. So the receptor (gp) was annotated to "detection of temperature stimulus involved in sensory perception of pain". In Karen's example, we would likely have annotations to both protein production terms and splicing terms. We have recently developed the ability to edit our annotations. This might be used to "correct" an obviously wrong annotation, but that was not the intent when developing the tool. I am sure there are "conflicting" annotations in RGD, where one is likely wrong, but that simply reflects the literature. Notes can be added to clarify conflicts.
Rama SGD No No In the absence of other information, probably * When we annotate: We annotate to downstream processes if that is the only information available. If a direct role is published later, we remove the downstream effect. We have a robust phenotype system to capture that data so that it is not lost.
Doug ZFIN No No Yes We prefer to annotate exactly what the paper shows, with a few exceptions. Many zebrafish papers characterize the phenotype of mutants. If that is a new mutant with little known about it, we would curate processes that may be affected like retina development or neural crest cell migration. If it later becomes clear that the mutated gene is really involved in the process of laying down the extracellular matrix on which cells migrate, we would make new annotations to that effect, but we would generally not go back and remove or update annotations unless they were patently false as agreed by the authors or they were incorrectly curated in the first place. Our view is that as annotations accumulate, the whole set of annotations will tell the story of what the gene does. In my case above, the mutant gene is involved in establishing the extracellular matrix on which the neural crest cells migrate perhaps. The establishment of extracellular matrix may be a more proximal process in which the gene plays a role, but both annotations are true. This is analogous to Rebecca's comments above. In general we only remove annotations that prove to be untrue or incorrectly annotated. We do not generally remove existing annotations if a more specific or more proximal process, like transcription factor activity vs. cell migration, for example is annotated
Val PomBase No No Yes I would not annotate to a "downstream process" if the gene was known to be directly involved in a "core process" For example, I would NOT annotate a component of the spliceosome, or the core RNA pol II or the translation machinery machinery to a downstream process.

However, I WOULD annotate a specific transcriptional regulator to a downstream process. For example I have annotate the transcription factor sre1 to +ve reg of heme biosynthesis and positive regulation of phospholipid biosynthesis as it has been found to transcriptionally control these pathways. I agree with Ruth's comment above that ideally these would be regulation of transcription involved in process, (these could b e on the fly cross products?) I would also annotate a "protein modification" process to the downstream processes which it regulates, for example acetylation/ubiquitinaltion/ phosphorylation events would not only be annotated to the modification terms but also to the regulation of X process for the cellular pathways which are affected (this is probably the main way in which my protocol differs from SGD). For splicing if a general defect in splicing was shown to lead to a downstream effect I would not annotate, but in the case where for example there is regulated splicing to retain the introns in genes specific for meiosis during mitotic growth, I would consider this to be a way that the cell regulates a downstream process and I would annotate accordingly.

Yes, f the gene is involved in a core processes I would remove the annotation. However, if the process has been shown in a upstream but is involved in the rate of the downstream process and this regulation appears to be biologically relevant I will keep it, but I will change the original annotation to "regulation of " (curator judgement). This is because we now know that the gene does not have direct involvement in the process, but it does contribute to the regulation of the process.

For example trx1 in fission yeast used to be annotated to DNA replication by ISS to SGD, more information about the cellular role has resulted in this annotation being moved to "regulation of DNA replication" and the ISS has been updated accordingly.

Donghui TAIR No No yes (but the problem is we don't know for sure whether we are annotating to 'downstream effect' or the process that the protein is directly responsible for based on the information in a single article we are annotating) 1) I would annotate to a downstream process if the author states clearly that the protein has a role in such process. For example, PMID 20202164 (A plant-specific histone H3 lysine 4 demethylase represses the floral transition in Arabidopsis): the protein PKDM7B is a H3-K4 demethylase, it represses FT and TSF expression to inhibit flowering – FT and TSF are genes regulating flowering. Both in the article title and results, the author states that PKDM7B ‘represses the floral transition’. I therefore annotate to the following terms: histone demethylase activity (H3-K4 specific) (GO:0032453), histone H3-K4 demethylation (GO:0034720); negative regulation of flower development (GO:0009910). The first two are ‘direct’ processes; the third one is a downstream process term. 2) I would exercise my own judgment in cases where a phenotype analysis has shown that a defect in a gene affects general development processes. The annotation would be done on a case-by-case basis based on the information presented in the paper and my own knowledge of the gene function, some are annotated to the downstream development processes; some are not (in these cases, I capture the information by phenotype annotation). No
Li Ni MGI No NO Yes MGI always annotate papers based on whatever evidence/granularity of the term is available. In terms of development, lots of time we annotate what the paper shows based on mutant phenotype. If it later becomes clear that the mutated gene is involved in some specific process, we add in new annotations. Right now, we generally do not go back remove or update annotations unless they were incorrectly curated in the first place. No

3. Proposed annotation policy

4. Examples (papers) and discussion of GO annotation issues

  • Submitted by Pascale: There are several SF items about growth/cell growth/cell proliferation. I know some of the terms were done to accommodate experiments done in Dicty - often people look at the rate of cell proliferation as a general phenotype, and we have been capturing this. It's very high level and usually IMP, but in the absence of other information it seems relevant (otherwise people would not bother testing it).

What do people think about this? To me it's similar to the issue of annotating from IEP or to high level developmental terms by IMP. The question is, what data are too general to be useful to capture?

  • Organismal behaviors are always quite controversial. For example, lonp gene of rat is annotated to aging.

5. Suggestions for Quality Control procedures

Back to 2010_GO_camp_Meeting_Agenda