Annotation Conf. Call, July 26, 2011
Examples for discussion at electronic Transcription Jamboree
Present: Rama, Julie, Jodi, Karen, Rob Nash, Dianna (SGD)
Kimberly (WB), Lakshmi (AgBase), Emily, Michele (UniProt), Val, Midori (PomBase) Becky, Jane, Paola (GO editorial), Ruth, Varsha (BHF-UCL), Susan (FlyBase), Doug (ZFIN), Stan, Tim, Tom, Shur-Jen (RGD)
Resources for curators
- Karen's curation manual File:TxnOHreannotationGuide.xls.pdf
When do we have enough data for a MF annotation?
i) PMID 11486045 [VK] - Fig 2 - this data would have been previously annotated to the now obsolete term GO:0016564 transcription repressor activity.
- can we annotate to anything more than the biological process: 'GO:00045892; negative regulation of transcription, DNA-dependent', and 'DNA binding' as the molecular function?
- There is no indication anymore in the available MF terms of a proteins repressor activity, concerns about losing this information.
- if we can describe the binding of a protein as activating transcription in the molecular function ontology (GO 0001102 RNA polymerase II activating transcription factor binding ), why are we no longer able state that a transcription factor is activating or repressing in its activity?
- The reason we don't have a term for "transcription repressor activity" in MF is because this is not describing how it acts, but only that it has a negative act on trancription, which is equivalent to saying "negative regulation of transcription" and this is info that belongs in the BP ontology. Thus, we should use the term 'GO:0045892; negative regulation of transcription, DNA-dependent' (or one of its child terms for a specific RNA polymerase) to represent this role.
- From looking at the whole paper, HERP1 could be annotated to appropriately granular child terms (probably RNAP II specific) of 'corepressor binding' (a term that will need to be created by Karen and David, 'GO:0043565 sequence-specific DNA binding' and the BP term 'GO:0045892; negative regulation of transcription, DNA-dependent'.
- How can we have a term corepressor binding when we don't have a clear idea of what a repressor is?
- Karen: it was felt that users of the GO would find this more granular binding term useful. This is similar to the case with the basal TF binding, where there is no corresponding basal TF term in MF, but where the additional descriptiveness of the binding term was thought to be of use.
- Both Midori and Karen felt that we don't need to have all the pieces defined with corresponding MF terms in order to refer to them with binding terms.
- A composite term that together describes these 3 terms is also appropriate (Varsha/Karen)
- perhaps something similar to sequence-specific DNA binding, transcription activity, co-repressor recruiting, negative regulation of transcription DNA dependent (but using a snappier turn-of-phrase) could be created, which would describe in the MF ontology different types of transcription repressors
- Ruth and Varsha: would find these proposed, granular MF terms useful.
- Ruth: however there is the assumption that just because a protein is demonstrated to bind to a co-repressor complex (e.g. Sin3A complex), it may not be able to equally demonstrate that the complex is recruited to the promoter, as no increase in the the amount of co-repressor is demonstrated.
- Karen disagreed, Karen and Ruth to discuss further on SourceForge
How much curator judgement is appropriate?
i) PMID 12270142 [VK] - as K/O of TWIST decreases expression of CBFA1, from figs 1 and 2 could we annotate to
GO:0045944 positive regulation of transcription from RNA polymerase II promoter
- More generally, if a curator is trying to describe the up-regulation of a protein-encoding gene, then is it correct to assume that regulation of expression would necessarily be via the RNA polymerase II. We shouldn't need direct evidence demonstrating the involvement of RNA polymerase II in the experiment?
- Similarly, based on fig 3 could we annotate to:
GO:2000679 positive regulation of transcription regulatory region DNA binding
Is this correct or are there other terms that should have been used instead?
- How can we support curators in judging the part of the DNA region that is being bound, e.g. so that curators can more easily identify a proximal region? Could we include in the curation manuals a diagram of the promoter/regulatory regions which bind proteins - and identify the protein that bind to distinct regions? (perhaps from Karen's presentation, linked to this page?)
- How do you know where an element is located - if the authors do not specifically state it is located upstream? For instance, in this case (PMID:19342457) the element is stated as NOT being upstream, but instead in an intron - how should we capture this data?
- Yes, for almost all protein coding genes in eukaryotes (there are a few rare exceptions, like trypanosomes, where RNAP I transcribes a few of the protein coding genes), you can confidently use the RNA Pol II terms even if the paper doesn't explicitly mention that detail. People working on a gene transcribed by RNAP I or III will tell you that explicitly. Maybe there should be a comment to this effect in the curation manual and/or Comment field in the ontology?
- Can I annotate to- 'GO:2000679 positive regulation of transcription regulatory region DNA binding' based on Fig 3?
- No because this might be an indirect effect (less protein and hence less DNA binding). Annotation to 'positive regulation of transcription from RNA polymerase II promoter ' would be good.
- How do curators figure what which is proximal/core promoter etc?
- Brief summary of usage of the word "promoter"
- Core Promoter is where the basal txn factor and RNA pol II machine binds.
- Proximal region is proximal to the core promoter
- The word promoter has been used in multiple ways. In E.coli, promoters were originally definied as the region bound by the RNAP and basal TFs. In eukaryotic transcription research, some researchers used the word promoter to include both the core promoter and the entire upstream regulatory region. This is why the promoter binding GO term has been removed, and replaced by core promoter binding and 'core promoter proximal region DNA binding'
- If a curator is not confident of identifying the core promoter, then best to just annotate to a term like "RNA polymerase II regulatory region DNA binding".
- Brief summary of usage of the word "promoter"
- What are enhancers?
- The definition of an "enhancer" suffers from a similar ambiguity in the literature as that of "promoter". In mammals distal enahncers are seen to act independently of location or distance, however in E. coli enhancers are common in sigma-54 type promoters and act at a specific distance from the core promoter.
- Distal enhancers are present in mammalian systems. Yeast (pombe and cerevisiae) do not have distal enhancers and maybe there should be a taxon constraint on a term for "distal enhancer" (esp. as some GO annotations from PomBase and SGD currently exist).
- Karen and David are currently revisiting the enhancer GO terms.
- All these definitions should make it to the curation manual.
How do I annotate to one of the granular MF transcription factor activity terms when the evidence comes from 2 (or more) papers?
The gene CUP9 is described in the literature as a transcriptional repressor, but there is no single paper that contains evidence that allows annotation to the term that seems like the best description of its function. Note that these papers have a lot of information, but for this discussion, only short selections of each, indicated below, need to be looked at.
PMID:9427760 shows that:
- deletion of the CUP9 gene derepresses transcription of the PTR2 gene (Fig 2A and supporting text, right hand column of p 271 above the header). From this experiment, we can make the annotation negative regulation of transcription from RNA polymerase II promoter (GO:0000122) by IMP.
- CUP9 binds DNA (Fig 2B & brief discussion of figure on pp 271-2 under the heading "Cup9p is a repressor of the PTR2 gene"). From this experiment, we can make the annotation: RNA polymerase II core promoter proximal region sequence-specific DNA binding (GO:0000978) by IDA
PMID:18708352 shows that CUP9 interacts with CYC8(SSN6)/TUP1 (Fig 2B and text beginning with "To address this experimentally, we carried out a coimmunoprecipitation assay, using epitope-tagged SSN6 and CUP9 (Fig. 2B).". From this, we can make the annotation RNA polymerase II repressing transcription factor binding (GO:0001103) by IPI with CYC8 (aka SSN6).
Using these pieces of evidence in combination, the term sequence-specific transcription regulatory region DNA binding RNA polymerase II transcription factor recruiting transcription factor activity (GO:0001133) would be a good description of the overall activity of Cup9, but since neither these papers, or any others available for CUP9, show all the evidence for this term in a single paper, how do you annotate to a term like this?
- SGD would make 2 annotations to the composite term GO:1133, one with IC from GO:122 (citing PMID: 9427760) and another annotation to the same GO:1133 with IC from GO:978 (citing PMID:18708352).
- Some felt that this is sub-optimal because the evidence is from 2 papers and not any one IC annotation can support the composite term. However a way of capturing this information needs to put in place that can be used by LEGO in future to capture this data.
- Val mentioned that in a case like this she tends to make one IC annotation from all 3 GOIDs (which are piped together in the with field) and use a unpublished PomBase internal reference in the reference field. The source information should be traceable just from the GO identifier included in the 'with' field
- Emily: I'd prefer if these annotations did not use a generic unpublished reference, as it seems to downgrade the quality of the annotation.
- Many of us were hearing the use of the unpublished reference for the first time. We ran out of time. We will discuss this at the next call. Curators will think about this case.
How do I find the correct transcription factor activity from these terms
GO:0000982 RNA polymerase II core promoter proximal region sequence-specific DNA binding transcription factor activity
Definition: Interacting selectively and non-covalently with a sequence of DNA that is in cis with and relatively close to a core promoter for RNA polymerase II (RNAP II) in order to modulate transcription by RNAP II.
and GO:0001133 sequence-specific transcription regulatory region DNA binding RNA polymerase II transcription factor recruiting transcription factor activity
Definition: Interacting selectively and non-covalently with a specific sequence of DNA that is part of a regulatory region that controls transcription of that section of the DNA by RNA polymerase II and recruiting another transcription factor to the DNA in order to modulate transcription by RNAP II.
They sound the same but they are not even parent/child (Not that the only annotation to the second term is S. cerevisiae GAL4 IC only CUP9 IC only SFL1 IPI only)
See QuickGO ancestor chart view here: http://www.ebi.ac.uk/QuickGO/GMultiTerm#a=64%2400FM00Hj&c&tab=chart
Original SourceForge request here: https://sourceforge.net/tracker/?func=detail&aid=3377031&group_id=36855&atid=440764
- Karen will add some missing has_part relationships, will comment on SF