Proposals to overhaul transcription in GO - 2010

From GO Wiki
Jump to navigation Jump to search

Philosophy of Overhaul

Over the last few years, GO has changied how we talk about and create Function terms so that they now represent how something occurs, e.g. binding, catalytic activities, etc. We are now also avoiding Function terms that duplicate process terms, that is functions that do not describe how the gene product acts but only specify that it is involved in a process. However, despite the fact that we have always said that Function and Process should represent non-overlapping aspects, we have many older terms in Function that essentially duplicate a Process term. Compare, for example, the Function term transcription regulator activity with the Process term regulation of transcription. Both terms essentially mean the same thing. In addition, the Function term transcription regulator activity is not grouping the terms below it on the basis of having similar functions, but rather on the basis of being involved in the same process. This lack of clarity in the distinction between Function and Process generates confusion, both for annotators and for users. One researcher at the meeting told me that she only uses GO occasionally and she can never remember whether the term she wants is in Function or Process.

One of the major goals of this overhaul is to generate clarity between the function terms and the process terms for transcription. We are proposing to eliminate some Function terms that are equivalent to Process terms and which cannot be converted into a description of the molecular activity, or activities, involved. In other cases, we are proposing changes to Function terms so that they actually describe molecular activities.

With respect to annotation, these changes will mean that in cases where the experiments indicate that a gene product is involved in regulating transcription, but give no indication as to how it acts, it would be appropriate to annotate only with a Process term and not with a Function term. With the recently developed method of creating links between Function and Process terms, the old motivations to have terms like transcription regulator activity should be addressed anyway, since terms representing functions involved in regulation of transcription will have a relationship to that Process terms, or to a more specific child term as appropriate.

Molecular Function

transcription factor activity - GO:0003700 & promoter binding - GO:0010843

The term "transcription factor activity" has parentage under "DNA binding", rather than under "sequence-specific DNA-binding", even though it is defined as binding to a specific DNA sequence. The majority of things in this class are "regulatory" transcription factors which bind a specific DNA sequence present in a relatively limited set of promoters. There are also basal transcription factors which bind to specific core promoter elements in a sequence specific way.

However, we also need to account for the fact that not all basal txn factors bind DNA. Currently, we also have terms like 'RNA polymerase II transcription factor activity" that do not have parentage under "DNA binding" and which really mean anything involved in regulating RNAP II transcription, i.e. basically a process definition. This problem has been reported in SourceForge multiple times because the current structure makes it impossible to indicate that a transcription factor is a sequence-specific DNA binding factor for a specific polymerase; you can either indicate that it binds DNA, or that it is a factor for RNAP I, II, or III, but not that it binds a specific sequence for a specific RNAP.

We would like to have function terms that indicate what type of DNA sequence element is being bound, e.g. a basal promoter element versus the binding site for a regulatory transcription factor, such as Gal4 in yeast. Additionally, we would like to be able to indicate binding to enhancer sites. So the distinction in function will be by the type of DNA site bound.

Proposal: With that in mind, here is a proposed structure (where no number is indicated, the term would be new) and some specific changes proposed for some of the existing terms.

Changes to existing terms:

  1. transcription factor activity - GO:0003700
    • Change name to "sequence specific DNA-binding transcription factor activity", as shown in structure below, or similar
    • Change position to reflect current definition that this indicates sequence specific binding to DNA. Currently, this term is directly under DNA-binding, but the definition specifies a specific sequence, so we propose to move it to be a direct child of sequence-specific DNA binding (GO:0043565).
  2. promoter binding - GO:0010843
    • Change either name or definition. Currently this term is defined too narrowly such that it only includes the core promoter elements, while the binding sites for regulatory transcription factors are also considered to be promoter elements. We recommend changing the definition to match the name. The other possibility is changing the name to core promoter binding so that it matches the current definition, but if people have annotated based on the broader term name, annotations will become incorrect with this option.
    • The def should also avoid specifying that the binding sites are for complexes; not all transcription factors are.

Structure showing new terms and relationship to existing terms:

- DNA binding - GO:0003677
-- (i) sequence-specific DNA-binding - GO:0043565
--- (i) sequence-specific DNA-binding transcription factor activity - GO:0003700
---- (i) sequence-specific promoter binding - GO:0010843 or GO:new
----- (i) sequence-specific core promoter binding  - GO:0010843 or GO:new
------ (i) sequence-specific RNA polymerase I core promoter binding
------ (i) sequence-specific RNA polymerase II core promoter binding
------ (i) sequence-specific RNA polymerase III core promoter binding
----- (i) sequence-specific regulatory transcription factor site* binding
------ (i) sequence-specific promotor transcription factor site binding
------- (i) sequence-specific RNA polymerase I promotor transcription factor site binding
------- (i) sequence-specific RNA polymerase II promotor transcription factor site binding
------- (i) sequence-specific RNA polymerase III promotor transcription factor site binding
---- (i) specific enhancer transcription factor site binding

Proposed obsoletions:

We propose that all children of promoter binding (GO:0010843) should be obsoleted. All of these represent binding to specific sequence motifs. It is beyond the scope of GO to capture the thousands of individual sequence motifs that exist. We are working with Karen Eilbeck of SO to make sure that GO and SO are in synch with respect to how promoter motifs are defined and even SO is planning only to represent general classes of motifs, not every specific one. Thus, we feel that the appropriate level of detail for GO to capture for promoters is whether it is a core promoter element, a binding site for a regulatory transcription factor, or an enhancer binding site. Thus we would like to indicate the types of binding to general types of motifs as indicated in the structure above. These existing terms that indicate very specific motifs, we feel should be obsoleted.

  • cAMP response element binding - GO:0035497
    def: "Interacting selectively and non-covalently with the cyclic AMP response element (CRE), a short palindrome-containing sequence found in the promoters of genes whose expression is regulated in response to cyclic AMP." [PMID:2875459, PMID:2900470]
  • carbohydrate response element binding - GO:0035538
    def: "Interacting selectively and non-covalently with the carbohydrate response element (ChoRE) found in the promoters of genes whose expression is regulated in response to carbohydrates, such as the triglyceride synthesis genes." [GOC:BHF, PMID:20001964]
  • E-box binding - GO:0070888
    def: "Interacting selectively and non-covalently with an E-box, a DNA motif with the consensus sequence CANNTG that is found in the promoters of a wide array of genes in neurons, muscle and other tissues." [GOC:BHF, GOC:vk, PMID:11812799]
  • estrogen response element binding - GO:0034056
    def: "Interacting selectively and non-covalently with the estrogen response element (ERE), a conserved sequence found in the promoters of genes whose expression is regulated in response to estrogen." [GOC:ecd, PMID:15036253, PMID:17975005]
  • juvenile hormone response element binding - GO:0070594
    def: "Interacting selectively and non-covalently with the juvenile hormone response element (JHRE), a conserved sequence found in the promoters of genes whose expression is regulated in response to juvenile hormone." [GOC:sart, PMID:17956872]
  • mitochondrial heavy strand promoter anti-sense binding - GO:0070362
    def: "Interacting selectively and non-covalently with the anti-sense strand of the heavy strand promoter, a promoter located on the heavy, or guanine-rich, strand of mitochondrial DNA." [GOC:mah, PMID:9485316]
  • mitochondrial heavy strand promoter sense binding - GO:0070364
    def: "Interacting selectively and non-covalently with the sense strand of the heavy strand promoter, a promoter located on the heavy, or guanine-rich, strand of mitochondrial DNA." [GOC:mah, PMID:9485316]
  • mitochondrial light strand promoter anti-sense binding - GO:0070361
    def: "Interacting selectively and non-covalently with the anti-sense strand of the light strand promoter, a promoter located on the light, or cytosine-rich, strand of mitochondrial DNA." [GOC:mah, PMID:9485316]
  • mitochondrial light strand promoter sense binding - GO:0070363
    def: "Interacting selectively and non-covalently with the sense strand of the light strand promoter, a promoter located on the light, or cytosine-rich, strand of mitochondrial DNA." [GOC:mah, PMID:9485316]
  • serum response element binding - GO:0010736
    def: "Interacting selectively and non-covalently with the serum response element (SRE), a short sequence with dyad symmetry found in the promoters of some of the cellular immediate-early genes, regulated by serum." [GOC:BHF, GOC:dph, GOC:rl, GOC:tb]
  • sterol response element binding - GO:0032810
    def: "Interacting selectively and non-covalently with the sterol response element (SRE), a nonpalindromic sequence found in the promoters of genes involved in lipid metabolism." [GOC:vk, PMID:11994399]
  • vitamin D response element binding - GO:0070644
    def: "Interacting selectively and non-covalently with the vitamin D response element (VDRE), a short sequence with dyad symmetry found in the promoters of some of the cellular immediate-early genes, regulated by serum." [GOC:BHF, GOC:vk, PMID:17426122]

transcription regulator activity - GO:0030528

The term transcription regulator activity (GO:0030528) is the highest level Function term for transcription and it is essentially identical to a Process term. It conveys exactly the same information as the Process term regulation of transcription and it does NOT convey any information about the molecular nature of the regulator activity. In addition, it is grouping the child terms below it based on involvement in a common Process, not based on having a common Function and it currently includes functions based on binding DNA and functions based on interacting with other proteins.

MF: transcription regulator activity - GO:0030528
Current definition: Plays a role in regulating transcription; may bind a promoter or enhancer DNA sequence or interact with a DNA-binding transcription factor.

BP: regulation of transcription - GO:0045449
Current definition: Any process that modulates the frequency, rate or extent of the synthesis of either RNA on a template of DNA or DNA on a template of RNA.

Proposal (change to existing term): We propose to merge this Function term (GO:0030528) into the equivalent Process term (GO:0045449). [There is precedent for this type of merge with the merge of the Function term splicing factor activity into the equivalent Process term.]

children of transcription regulator activity (GO:0030528)

Most of these terms have essentially the same problem as transcription regulator activity (GO:0030528) because their definitions hinge on the definition of transcription regulator activity which cannot be defined in terms of a singular function (see above).

  • transcription activator activity - GO:0016563
    def: "Any transcription regulator activity required for initiation or upregulation of transcription."
  • transcription repressor activity - GO:0016564
    def: "Any transcription regulator activity that prevents or downregulates transcription."
  • specific transcriptional repressor activity - GO:0016566
    def: "Any activity that stops or downregulates transcription of specific genes or sets of genes."
  • basal transcription repressor activity - GO:0017163
    def: "Any transcription regulator activity that prevents or downregulates basal transcription. Basal transcription results from transcription that is controlled by the minimal complement of proteins necessary to reconstitute transcription from a minimal promoter."
  • general transcriptional repressor activity - GO:0016565
    def: "Any activity that stops or downregulates transcription of genes globally, and is not specific to a particular gene or gene set."
  • transcription initiation factor activity - GO:0016986
    def:
  • sigma factor activity - GO:0016987
    def: "A sigma factor is the promoter specificity subunit of eubacterial-type multisubunit RNA polymerases, those whose core subunit composition is often described as alpha(2)-beta-beta-prime. (This type of multisubunit RNA polymerase complex is known to be found in eubacteria and plant plastids). Although sigma does not bind DNA on its own, when combined with the core to form the holoenzyme, this binds specifically to promoter sequences, with the sigma factor making sequence specific contacts with the promoter elements. The sigma subunit is released from the elongating form of the polymerase and is thus \

free to act catalytically for multiple RNA polymerase core enzymes."

  • mitochondrial transcription initiation factor activity - GO:0034246
    def: "A transcription factor activity that confers promoter specificity upon mitochondrial RNA polymerase, in a manner analogous to eubacterial sigma factors."
  • transcription initiation factor antagonist activity - GO:0016988
    def: "The function of binding to a transcription factor and stopping, preventing or reducing the rate of its transcriptional activity."
  • sigma factor antagonist activity - GO:0016989
    def: "The function of binding to a sigma factor and stopping, preventing or reducing the rate of its transcriptional activity."


Proposal (changes to existing terms):

  1. transcription activator activity - GO:0016563
    • As currently defined this is equivalent to the process term positive regulation of transcription (GO:0045941).
    • There may be specific types of activators that can be described in terms of molecular activities, perhaps sequence-specific RNA polymerase II promotor transcription factor site binding involved in activation of transcription (see section on transcription factor activity
    • We may want to obsolete this term so that we can suggest multiple consider terms, the process term positive regulation of transcription (GO:0045941) as well as any function terms that can be created of the type suggested above.
  2. transcription repressor activity - GO:0016564
    • As currently defined this is equivalent to the process term negative regulation of transcription (GO:@@@@)
    • I think this term can be dealt with similarly to transcription activator activity above
  3. specific transcriptional repressor activity - GO:0016566
  4. basal transcription repressor activity - GO:0017163 & general transcriptional repressor activity - GO:0016565