2010 GO camp downstream effect
2. Review of current GO annotation practices
- Annotating signaling biological processes to transcription factors
- when not to capture phenotypes : from 22nd Feb Jamboree call , Tanya: It's not uncommon for the initial publications to describe a mutant phenotype, with a developmental defect, and then later publications to describe much more explicit functions or processes. You should always annotate based on whatever evidence is available. Once you've done that, the question becomes, "When do we keep or remove the phenotype-based annotations?" At TAIR, their policy is to keep the developmental terms if they think that their users would expect to see them. Some participants suggested that one would expect all orthologs to have the same development-type annotations, across organisms. Others disagreed with this expectation.
- [From Karen] This is fairly anecdotal. I wasn't able to find papers about this, but it's been known for a long time so that isn't necessarily surprising.
In S. cerevisiae, there are a number of genes which are components of the spliceosome, which when mutated produce strains with defects in protein production/accumulation. Early on, some of these genes were thought to be involved in translation. It was later determined that these genes are components of the spliceosome which are involved in mRNA splicing and not directly in translation at all. The reason why splicing defects produce translation defects is related to the distribution of introns in cerevisiae. Out of about 6000 genes, only about 270 contain introns. Many of the intron containing genes are ribosomal protein genes. Combined with the fact that ribosomal protein genes are highly transcribed, splicing defects have a disproportionate effect on production of ribosomal proteins and thus on translation.
So, while it is true that mutations in many spliceosomal genes produce a phenotype of defects in protein production, it is very clear that this is a downstream effect related to the fact that the majority of mRNAs to be spliced are ribosomal protein genes. Thus, we do not use the mutant phenotype of a defect in protein production to annotate these genes to GO terms related to translation.
- Ranjana from WormBase: Checking for embryonic lethality or larval stages that do not develop further are very common assays that authors do in the elegans field, such that it may feel like our genes are over-annotated to the terms "embryonic development ending in birth or egg-hatching" and/or "nematode larval development". Also, like Tanya pointed out, we too annotate to the paper, if you want to say we have annotated every paper that talks about a gene, then you record everything. Sometimes its hard to tell whether something is a downstream effect. We have no mechanism in place to go back and remove these high-level development terms once the core process/function is known.
Annotating to downstream processes
Could at least one person from each group fill in this survey to give an idea of how much discrepancy there is between groups/annotators. This will give us an idea of the issues that are controversial and need to be discussed. In the first conference call (see discussions page for minutes), these five areas were identified as being used for downstream process annotations;
- 1. development
- 2. ageing
- 3. signalling - when does a process start/end
- 4. IMPs from large-scale mutant screens and from individual experiments
- 5. gene products with few papers available (e.g. a phenotype is all you have to annotate with)
It would be useful to know which groups annotate to any of these areas and if you DON'T annotate to a certain area, why not? If your examples are too long for the table, please insert them into the main body of section 2. Review of current GO annotation practices.
|Name||Group||Do you always annotate to downstream processes?||Do you never annotate to downstream processes?||Do you sometimes annotate to downstream processes?* (If yes, please answer next question)||*Give examples of when you would AND wouldn't annotate to a DS process||Do you go back and remove annotations if you find out more specific information about a gene products?|
|Rachael||GOA||No||No||Yes||General: If it adds information that maybe useful to users I would usually annotate DS, but not at the cost of diluting what the gp is centrally involved in. I would not annotate to examples such as Karen's above which clearly demonstrates that effect on translation is a side effect of the mutation of the spliceosome. 1.Development - I think this is useful when users want to know all genes involved in e.g. eye development. I would annotate if a gp was repeatedly shown to have an effect on 'x' development (and, if the evidence was convincing, a single paper reporting an effect on development as this may be previously unknown information), but I maybe wouldn't annotate if there were pleiotropic effects. 3. Signalling - If a gp was repeatedly shown to have an effect on a particular pathway I would annotate. 4. I wouldn't annotate large-scale screens, but I do annotate individual experiments. 5. I would annotate if few papers available and the evidence was good.||If they are old NAS/TAS evidence annotations I'll try to improve the evidence/granularity of the term or if there are high-level terms without much meaning I will probably delete them. I wouldn't look at a paper that had already been curated with experimental evidence codes. If a previous annotation is noticeably different to what I have read about the gp, I will go and check the original paper.|
|Fiona||AgBase||No||No||Yes||General: There are a lot of chicken papers where they do IMP on very early embyronic stages and see many phenotypic effects. If the gene is well studied we can sometimes tease apart what processes we should annotate to but oftentimes this may be the only gene product information we have.||No|
|Rebecca||GOA||No||No||Yes||General: Alot of frog annotation is development based, and a role in a particular part of development is annotated based on mutant phenotype- it's not always possible to see what is direct and what is downstream, so I would record what the paper shows with IMP. Even if it is a downstream effect, if it isn't an artefact of the assay, I think it's still valid to annotate it. Transcription factors for example can regulate a number of different genes so have a whole load of downstream effects, which I would curate if the paper showed. I probably wouldn't curate downstream effects if the authors explicitly say they are knock-on effects. I tend to be more generous than cautious in annotating downstream events.||Not often. Generally I curate on a protein-by protein basis so unless I'm adding in a new paper, I don't revisit the same protein often. I would replace IC, TAS or NAS with an experimental evidence tag if possible, and relook at annotations if there are glaring inconsistencies, but otherwise no.|
|Ruth & Varsha||BHF-UCL||No||No||Yes|| General: We would aim to capture what the gp is 'centrally' involved in, but also to recognise what the overall gp 'function' is. ie a digestive enzyme plays a role in digestion not just substrate catabolism. Karen's example we would aim to annotate to spliceosome rather than translation and rely on author statements to clarify the function/process of the gp.
||only on a protein-by-protein basis, we would replace IC, TAS or NAS with an experimental evidence tag if possible, and relook at annotations if there are glaring inconsistencies.|
|Pascale||dictyBase||No||No||In the absence of other information, probably||* When I do annotate: I annotate to downstream processes especially when there is no other information. I annotate to terms like 'cell proliferation' and some development terms, usually by IMP, and in some cases IEP.
||We have mixed feelings about removing annotations, due to concerns about not capturing the entire literature or appearingnot being neutral about which authors to cite.|
|Tim||RGD||No||No||Yes||General: We annotate to gene function/process, as assessed by the assay used in the experiment. We do not annotate "expression only" papers. We try to find the most specific GO term to match the assay. For example, a paper looked at the adenosine receptor subtypes that were known to be up-regulated by spinal cord injury. They showed that if they blocked one particular receptor subtype, they blocked the increased response to (thermal) pain. So the receptor (gp) was annotated to "detection of temperature stimulus involved in sensory perception of pain". In Karen's example, we would likely have annotations to both protein production terms and splicing terms.||We have recently developed the ability to edit our annotations. This might be used to "correct" an obviously wrong annotation, but that was not the intent when developing the tool. I am sure there are "conflicting" annotations in RGD, where one is likely wrong, but that simply reflects the literature. Notes can be added to clarify conflicts.|
|Rama||SGD||No||No||In the absence of other information, probably||* When we annotate: We annotate to downstream processes if that is the only information available.||If a direct role is published later, we remove the downstream effect. We have a robust phenotype system to capture that data so that it is not lost.|
|Doug||ZFIN||No||No||Yes||We prefer to annotate exactly what the paper shows, with a few exceptions. Many zebrafish papers characterize the phenotype of mutants. If that is a new mutant with little known about it, we would curate processes that may be affected like retina development or neural crest cell migration. If it later becomes clear that the mutated gene is really involved in the process of laying down the extracellular matrix on which cells migrate, we would make new annotations to that effect, but we would generally not go back and remove or update annotations unless they were patently false as agreed by the authors or they were incorrectly curated in the first place. Our view is that as annotations accumulate, the whole set of annotations will tell the story of what the gene does. In my case above, the mutant gene is involved in establishing the extracellular matrix on which the neural crest cells migrate perhaps. The establishment of extracellular matrix may be a more proximal process in which the gene plays a role, but both annotations are true. This is analogous to Rebecca's comments above.||In general we only remove annotations that prove to be untrue or incorrectly annotated. We do not generally remove existing annotations if a more specific or more proximal process, like transcription factor activity vs. cell migration, for example is annotated|
|Val||PomBase||No||No||Yes||I would not annotate to a "downstream process" if the gene was known to be directly involved in a "core process" For example, I would NOT annotate a component of the spliceosome, or the core RNA pol II or the translation machinery machinery to a downstream process.
However, I WOULD annotate a specific transcriptional regulator to a downstream process. For example I have annotate the transcription factor sre1 to +ve reg of heme biosynthesis and positive regulation of phospholipid biosynthesis as it has been found to transcriptionally control these pathways. I agree with Ruth's comment above that ideally these would be regulation of transcription involved in process, (these could b e on the fly cross products?) I would also annotate a "protein modification" process to the downstream processes which it regulates, for example acetylation/ubiquitinaltion/ phosphorylation events would not only be annotated to the modification terms but also to the regulation of X process for the cellular pathways which are affected (this is probably the main way in which my protocol differs from SGD). For splicing if a general defect in splicing was shown to lead to a downstream effect I would not annotate, but in the case where for example there is regulated splicing to retain the introns in genes specific for meiosis during mitotic growth, I would consider this to be a way that the cell regulates a downstream process and I would annotate accordingly.
|Yes, f the gene is involved in a core processes I would remove the annotation. However, if the process has been shown in a upstream but is involved in the rate of the downstream process and this regulation appears to be biologically relevant I will keep it, but I will change the original annotation to "regulation of " (curator judgement). This is because we now know that the gene does not have direct involvement in the process, but it does contribute to the regulation of the process.
For example trx1 in fission yeast used to be annotated to DNA replication by ISS to SGD, more information about the cellular role has resulted in this annotation being moved to "regulation of DNA replication" and the ISS has been updated accordingly.
|Donghui||TAIR||No||No||yes (but the problem is we don't know for sure whether we are annotating to 'downstream effect' or the process that the protein is directly responsible for based on the information in a single article we are annotating)||1) I would annotate to a downstream process if the author states clearly that the protein has a role in such process. For example, PMID 20202164 (A plant-specific histone H3 lysine 4 demethylase represses the floral transition in Arabidopsis): the protein PKDM7B is a H3-K4 demethylase, it represses FT and TSF expression to inhibit flowering – FT and TSF are genes regulating flowering. Both in the article title and results, the author states that PKDM7B ‘represses the floral transition’. I therefore annotate to the following terms: histone demethylase activity (H3-K4 specific) (GO:0032453), histone H3-K4 demethylation (GO:0034720); negative regulation of flower development (GO:0009910). The first two are ‘direct’ processes; the third one is a downstream process term. 2) I would exercise my own judgment in cases where a phenotype analysis has shown that a defect in a gene affects general development processes. The annotation would be done on a case-by-case basis based on the information presented in the paper and my own knowledge of the gene function, some are annotated to the downstream development processes; some are not (in these cases, I capture the information by phenotype annotation).||No|
|Li Ni||MGI||No||No||Yes||MGI always annotate papers based on whatever evidence/granularity of the term is available. In terms of development, lots of time we annotate what the paper shows based on mutant phenotype. If it later becomes clear that the mutated gene is involved in some specific process, we add in new annotations. Right now, we generally do not go back remove or update annotations unless they were incorrectly curated in the first place.||No|
|Ranjana Kishore||WormBase||No||No||Yes||Generally we annotate to whatever experimental evidence is available in the paper. elegans research is heavily phenotype-driven, it is very common for authors to always check for developmental effects, and it is not always known what the core process the gene may be involved in. As such, most of the times we tend to capture all the information available. We don't have any mechanism in place where we go back and remove more general effect processes once the core process the gene is involved in, is known.||No|
3. Proposed annotation policy
Draft guidelines Will be updated after our latest discussions
- For small scale experiments, in general curators should annotate to the experimental evidence in the paper. However, curator judgement should be used, taking into account what the curator knows about the gene product and the quality of the experimental assays performed in the paper. Quite often it is the case that the most relevant GO term will not exist. It is acceptable to request terms which describe the involvement of a process in another process if that will give more specificity to the annotation, for example, 'regulation of transcription involved in cardiac cell differentiation'. This may be preferable to annotating to the separate terms 'regulation of transcription' and 'cardiac cell differentiation' as it will be clear how the gene product is involved in cardiac cell differentiation. See separate guidelines for annotation of high-throughput experiments.
Example 1. Gene product involved in core process. Yeast RNA polymerase II subunit RPB2 (UniProtKB:P08518) is known to have a core function of RNA polymerase activity, then the protein is likely to have an effect on a large number of processes that are not necessarily related to it's function as an RNA polymerase. Most curators would agree that annotations should only be made to the core process of transcription rather than any downstream processes which are likely to be indirectly associated with the function of the gene product. Another example of this is in S. cerevisiae. There are a number of genes which are components of the spliceosome, which when mutated produce strains with defects in protein production/accumulation. Early on, some of these genes were thought to be involved in translation. It was later determined that these genes are components of the spliceosome which are involved in mRNA splicing and not directly in translation at all. The reason why splicing defects produce translation defects is related to the distribution of introns in cerevisiae. Out of about 6000 genes, only about 270 contain introns. Many of the intron containing genes are ribosomal protein genes. Combined with the fact that ribosomal protein genes are highly transcribed, splicing defects have a disproportionate effect on production of ribosomal proteins and thus on translation. So, while it is true that mutations in many spliceosomal genes produce a phenotype of defects in protein production, it is very clear that this is a downstream effect related to the fact that the majority of mRNAs to be spliced are ribosomal protein genes. Thus, we do not use the mutant phenotype of a defect in protein production to annotate these genes to GO terms related to translation. [hopefully Karen can find a specific example of this]
Example 2. Gene product involved in a specific process(es). Sometimes gene products have a core function such as RNA polymerase II transcription factor activity, but the processes they are involved in are limited to a few specific ones. For example, the S. pombe gene Sre1 (UniProtKB:Q9UUD1) is a direct transcriptional regulator of genes which have a role in heme and lipid biosynthesis (PMID:16537923). The curator judged this to be important information for this gene product and so, in addition to annotating to 'specific RNA polymerase II transcription factor activity' and 'regulation of transcription', also made annotations to 'positive regulation of heme biosynthesis' and 'positive regulation of lipid biosynthesis'.
- If a gene product has limited experimental literature, such as a newly characterised protein, it is acceptable to annotate to more general 'downstream' process terms that may represent a phenotype. For example the C. elegans gene RBP2 (UniProtKB:Q10578) was annotated to processes such as 'reproduction' and 'positive regulation of growth rate' using evidence from mutation experiments, but at the time of writing this gene product does not have annotations to its more specific role of transcription, potentially as the evidence for this is not yet available. As more functional information is published about a gene product, these annotations to potential downstream processes may be removed if they are deemed by the annotating group as indirect. However, several annotating groups prefer to leave these annotations in the gene product record as giving a more complete view of the role of the gene product (comment: this will be expanded to document in more detail what the different groups do and why).
- The majority of groups in the GO Consortium do not systematically revise annotation sets, simply because they do not have the man-power to do this. However, groups such as SGD and PomBase, who have a well-annotated proteome, are able to tailor each gene product record so that it reflects only the essential roles and functions of that gene rather than any downstream or indirect effects. Additionally, these groups have a system of recording phenotype data, so GO annotations that have been made based on mutant phenotype data are removed and recorded in their phenotype databases. It is worth noting that, for any annotating group, if a GO annotation is found to be incorrect when more information about the gene product is published, then it is recommended that these older annotations are deleted.
Annotation example for ligand-receptor pathway (slide) Comment: This slide will also be updated from our latest discussions
4. Examples (papers) and discussion of GO annotation issues
a. Would people want to start creating more specific terms such as ‘regulation of transcription involved in [X process]’? (Ruth/Val suggestion). Ruth’s example: One of the main problems with annotating to signaling is to capture the impact of a gene product function/intent on the cell/organism. For example a growth factor BMP2 is instrumental in cardiac cell differentiation, consequently a microarray analysis would show large numbers of genes up and down regulated following stimulation with BMP2. Ideally we would like to see GO terms such as 'regulation of transcription involved in cardiac cell differentiation' created (with parents regulation of transcription and cardiac cell differentiation). This would identify BMP2's 'intent' to change the 'state' of the cell.
b. Val example 1: PMID: 16537923 “A second comparison of wild-type and sre1del cells grown in the absence of oxygen allowed the assignment of Sre1p-dependent genes. Genes whose expression was significantly different in wild-type cells from that in sre1del cells in the absence of oxygen were designated Sre1p dependent.” “Sre1p activated multiple genes required for nonrespiratory oxygen consumption pathways such as ergosterol, heme, and sphingolipid synthesis (Table 1)” “Sre1p was required for expression of every anaerobically upregulated enzyme downstream of lanosterol, including the oxygen-requiring enzymes SPAC1687.16c, erg25+, erg11+, and erg5+.” “Most genes required for oxygen-dependent heme biosynthesis were also upregulated anaerobically (Fig. 3). Indeed, all of the late enzymes were Sre1p dependent, including the oxygen requiring enzymes hem13+ and hem14+. hem13+ is a rate-limiting enzyme in heme biosynthesis under oxygen-limiting conditions and was the most highly upregulated gene (14-fold) in our analysis (Table 1)” “To test whether genes identified by expression profiling represented direct transcriptional targets of Sre1p, we assayed Sre1p binding to gene promoters using chromatin immunoprecipitation.” “Sre1p bound to promoters of all four anaerobically upregulated Sre1p-dependent target genes tested (sre1+, hem13+, erg3+, and osm1+) (Fig. 4B).”
Which of these terms would you annotate Sre1p to;
1. GO:0003700 transcription factor activity
2. GO:0070455 positive regulation of heme biosynthetic process
3. GO:0046889 positive regulation of lipid biosynthetic process
Only the core process term, or the downstream terms as well because Sre1 is a specific transcriptional regulator for these processes?
c. Val example 2: Would you annotate to a "downstream process" if the gene was known to be directly involved in a "core process", e.g. RNA polymerase II subunit, annotate to RNA polymerase activity alone or include e.g. reproduction
e. Submitted by Val: Downstream effect - Sc CDC8
f. Submitted by Pascale: There are several SF items about growth/cell growth/cell proliferation. I know some of the terms were done to accommodate experiments done in Dicty - often people look at the rate of cell proliferation as a general phenotype, and we have been capturing this. It's very high level and usually IMP, but in the absence of other information it seems relevant (otherwise people would not bother testing it).
What do people think about this? To me it's similar to the issue of annotating from IEP or to high level developmental terms by IMP. The question is, what data are too general to be useful to capture?
g. (Ruth): Annotation to human Insulin (P01308), has this been over-annotated with process terms? See link http://www.ebi.ac.uk/QuickGO/GProtein?ac=P01308 (have to increase sample size to 80 to see all of the annotations)
h. Organismal behaviors are always quite controversial. For example, lonp gene of rat is annotated to aging.
i. (Pascale) Dicty nucleotide diphosphate kinase (ndkC-1)
catalyzes the phosphorylation of nucleoside or deoxynucleoside diphosphates into the corresponding triphosphates; acts on all di-nucleotides; from the Dictybase summary: Besides this role in intermediate nucleotide metabolism, NDP kinase has been implicated in a large number of cellular functions. Part of the cellular NDP kinase is associated with the plasma membrane and stimulated by cell surface cAMP receptors. The GTP produced by the action of NDP kinase is capable of activating G-proteins as monitored by altered G-protein-receptor interaction and the activation of the effector enzyme phospholipase C (G-protein coupled receptor protein signaling pathway) (Bominaar et al. 1993).Nucleoside diphosphate kinase is also associated with the cytoskeleton where it channels ATP to the myosin molecule (cytoskeleton organization and biogenesis) (Aguado-Velasco et al. 1996). Yet another role for NDP kinase is to feed GTP into the translation machinery (translational elongation), as inferred by its association with active ribosomes (Sonnemann and Mutzel 1995).
- GO: 0007010 cytoskeleton organization (PMID: 7733916)
- GO:0007186 G-protein coupled receptor protein signaling pathway (PMID: 8389692)
- GO: 0006414 translational elongation (PMID: 7733916)
Do other curators agree with those annotations?
5. Suggestions for Quality Control procedures
1. Check for co-annotation of a less-granular term with a more-granular term in the same path
Any action from this check is optional for each group as it may still be appropriate to keep both annotations, for example, it is acceptable to retain the less-granular annotation if;
- It has a ‘better’ evidence code
- The curator feels it adds weight to the more-granular annotation
- Both annotations add value, e.g. ‘histone methylation’ and
‘protein amino acid methylation’
Back to 2010_GO_camp_Meeting_Agenda