Proposals to overhaul transcription in GO - 2010

From GO Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Philosophy of Overhaul

Over the last few years, GO has changed how we talk about and create Function terms. We are have been moving towards an approach so that they now represent how something occurs, e.g. binding, catalytic activities, etc. In this process, we have begun to avoid creating Function terms, and also to eliminate existing ones, that duplicate process terms, that is functions that do not describe how the gene product acts but only specify that it is involved in a process. Despite the fact that we have always said that Function and Process should represent non-overlapping aspects, we have many older terms in Function that essentially duplicate a Process term. Compare, for example, the Function term transcription regulator activity with the Process term regulation of transcription. Both terms essentially mean the same thing. In addition, the Function term transcription regulator activity is not grouping the terms below it on the basis of having similar functions, but rather on the basis of being involved in the same process. This lack of clarity in the distinction between Function and Process generates confusion, both for annotators and for users. One researcher at the Gene Transcription in Yeast meeting told me that she only uses GO occasionally and she can never remember whether the term she wants is in Function or Process.

One of the major goals of this overhaul is to generate clarity between the function terms and the process terms for transcription. We are proposing to eliminate some Function terms that are equivalent to Process terms and which cannot be converted into a description of the molecular activity, or activities, involved. In other cases, we are proposing changes to Function terms so that they actually describe molecular activities. With the recently developed method of creating links between Function and Process terms, the old motivations to have terms like transcription regulator activity in order to be able to group functions by process should be addressed anyway, since terms representing functions involved in regulation of transcription will have a relationship to that Process term, or to a more specific child term as appropriate.

With respect to annotation, these changes will mean that in cases where the experiments indicate that a gene product is involved in regulating transcription, but give no indication as to how it acts, it would be appropriate to annotate only with a specific Process term but only with the root Function term.

Molecular Function

transcription factor activity - GO:0003700 & promoter binding - GO:0010843

The term "transcription factor activity" has parentage under "DNA binding", rather than under "sequence-specific DNA-binding", even though it is defined as binding to a specific DNA sequence. The majority of things in this class are "regulatory" transcription factors which bind a specific DNA sequence present in a relatively limited set of promoters. There are also basal transcription factors which bind to specific core promoter elements in a sequence specific way.

However, we also need to account for the fact that not all basal txn factors bind DNA. Currently, we also have terms like 'RNA polymerase II transcription factor activity" that do not have parentage under "DNA binding" and which really mean anything involved in regulating RNAP II transcription, i.e. basically a process definition. This problem has been reported in SourceForge multiple times because the current structure makes it impossible to indicate that a transcription factor is a sequence-specific DNA binding factor for a specific polymerase; you can either indicate that it binds DNA, or that it is a factor for RNAP I, II, or III, but not that it binds a specific sequence for a specific RNAP.

We would like to have function terms that indicate what type of DNA sequence element is being bound, e.g. a basal promoter element versus the binding site for a regulatory transcription factor, such as Gal4 in yeast. Additionally, we would like to be able to indicate binding to enhancer sites. So the distinction in function will be by the type of DNA site bound.

Proposal: With that in mind, here is a proposed structure (where no number is indicated, the term would be new) and some specific changes proposed for some of the existing terms.

Changes to existing Function terms:

  1. transcription factor activity - GO:0003700
    • Change name to "sequence specific DNA-binding transcription factor activity", as shown in structure below, or similar
    • Change position to reflect current definition that this indicates sequence specific binding to DNA. Currently, this term is directly under DNA-binding, but the definition specifies a specific sequence, so we propose to move it to be a direct child of sequence-specific DNA binding (GO:0043565).
  2. promoter binding - GO:0010843
    • Change either name or definition. Currently this term is defined too narrowly such that it only includes the core promoter elements, while the binding sites for regulatory transcription factors are also considered to be promoter elements. In addition, the child terms are not all core promoter elements. We recommend changing the definition to match the name. The other possibility is changing the name to core promoter binding so that it matches the current definition, but if people have annotated based on the broader term name, annotations will become incorrect with this option.
    • The def should also avoid specifying that the binding sites are for complexes; not all transcription factors are.


Structure showing types of new terms and relationship to existing Function terms:

- DNA binding - GO:0003677
-- (i) sequence-specific DNA-binding - GO:0043565
--- (i) sequence-specific DNA-binding transcription factor activity - GO:0003700
---- (i) sequence-specific promoter binding - GO:0010843 or GO:new
----- (i) sequence-specific core promoter binding  - GO:0010843 or GO:new
------ (i) sequence-specific RNA polymerase I core promoter binding
------ (i) sequence-specific RNA polymerase II core promoter binding
------- (i) sequence-specific RNA polymerase II core promoter binding involved in preinitiation complex formation (link to Process)
------- (i) sequence-specific RNA polymerase II core promoter binding involved in negative regulation of preinitiation complex formation (link to Process)
------ (i) sequence-specific RNA polymerase III core promoter binding
----- (i) sequence-specific regulatory transcription factor site binding
------ (i) sequence-specific promotor regulatory transcription factor site binding
------- (i) sequence-specific RNA polymerase I promotor regulatory transcription factor site binding
------- (i) sequence-specific RNA polymerase II promotor regulatory transcription factor site binding
------- (i) sequence-specific RNA polymerase III promotor regulatory transcription factor site binding
---- (i) specific enhancer transcription factor site binding

Note that there will also be links to Process terms, e.g. a Function term like sequence-specific RNA polymerase II promotor transcription factor site binding will have a relationship to a appropriate Process term about gene-specific transcription from RNA polymerase II promoter.

Proposed obsoletions:

We propose that all children of promoter binding (GO:0010843) should be obsoleted. All of these represent binding to specific sequence motifs. It is beyond the scope of GO to capture the thousands of individual sequence motifs that exist. We are working with Karen Eilbeck of SO to make sure that GO and SO are in synch with respect to how promoter motifs are defined and even SO is planning only to represent general classes of motifs, not every specific one. Thus, we feel that the appropriate level of detail for GO to capture for promoters is whether it is a core promoter element (generally done by a basal transcription factor), a binding site for a regulatory transcription factor, or an enhancer binding site (also bound by regulatory transcription factors). Thus we would like to indicate the types of binding to general types of motifs as indicated in the structure above. These existing terms that indicate very specific motifs, we feel should be obsoleted.

  • cAMP response element binding - GO:0035497
    def: "Interacting selectively and non-covalently with the cyclic AMP response element (CRE), a short palindrome-containing sequence found in the promoters of genes whose expression is regulated in response to cyclic AMP."
  • carbohydrate response element binding - GO:0035538
    def: "Interacting selectively and non-covalently with the carbohydrate response element (ChoRE) found in the promoters of genes whose expression is regulated in response to carbohydrates, such as the triglyceride synthesis genes."
  • E-box binding - GO:0070888
    def: "Interacting selectively and non-covalently with an E-box, a DNA motif with the consensus sequence CANNTG that is found in the promoters of a wide array of genes in neurons, muscle and other tissues."
  • estrogen response element binding - GO:0034056
    def: "Interacting selectively and non-covalently with the estrogen response element (ERE), a conserved sequence found in the promoters of genes whose expression is regulated in response to estrogen."
  • juvenile hormone response element binding - GO:0070594
    def: "Interacting selectively and non-covalently with the juvenile hormone response element (JHRE), a conserved sequence found in the promoters of genes whose expression is regulated in response to juvenile hormone."
  • mitochondrial heavy strand promoter anti-sense binding - GO:0070362
    def: "Interacting selectively and non-covalently with the anti-sense strand of the heavy strand promoter, a promoter located on the heavy, or guanine-rich, strand of mitochondrial DNA."
  • mitochondrial heavy strand promoter sense binding - GO:0070364
    def: "Interacting selectively and non-covalently with the sense strand of the heavy strand promoter, a promoter located on the heavy, or guanine-rich, strand of mitochondrial DNA."
  • mitochondrial light strand promoter anti-sense binding - GO:0070361
    def: "Interacting selectively and non-covalently with the anti-sense strand of the light strand promoter, a promoter located on the light, or cytosine-rich, strand of mitochondrial DNA."
  • mitochondrial light strand promoter sense binding - GO:0070363
    def: "Interacting selectively and non-covalently with the sense strand of the light strand promoter, a promoter located on the light, or cytosine-rich, strand of mitochondrial DNA."
  • serum response element binding - GO:0010736
    def: "Interacting selectively and non-covalently with the serum response element (SRE), a short sequence with dyad symmetry found in the promoters of some of the cellular immediate-early genes, regulated by serum."
  • sterol response element binding - GO:0032810
    def: "Interacting selectively and non-covalently with the sterol response element (SRE), a nonpalindromic sequence found in the promoters of genes involved in lipid metabolism."
  • vitamin D response element binding - GO:0070644
    def: "Interacting selectively and non-covalently with the vitamin D response element (VDRE), a short sequence with dyad symmetry found in the promoters of some of the cellular immediate-early genes, regulated by serum."

transcription regulator activity - GO:0030528

The term transcription regulator activity (GO:0030528) is the highest level Function term for transcription and it is essentially identical to a Process term. It conveys exactly the same information as the Process term regulation of transcription and it does NOT convey any information about the molecular nature of the regulator activity. In addition, it is grouping the child terms below it based on involvement in a common Process, not based on having a common Function and it currently includes functions based on binding DNA and functions based on interacting with other proteins.

MF: transcription regulator activity - GO:0030528
Current definition: Plays a role in regulating transcription; may bind a promoter or enhancer DNA sequence or interact with a DNA-binding transcription factor.

BP: regulation of transcription - GO:0045449
Current definition: Any process that modulates the frequency, rate or extent of the synthesis of either RNA on a template of DNA or DNA on a template of RNA.

Proposal (changes to and obsoletions of existing terms):

  1. We propose to merge this Function term (GO:0030528) into the equivalent Process term (GO:0045449). [There is precedent for this type of merge with the merge of the Function term splicing factor activity into the equivalent Process term.]
  2. There are three regulation terms in Process based on this term which would need to be dealt with.
    • One option would be to also merge them into the Process terms like regulation of transcription and the appropriate positive and negative regulation terms. Alternately, we could obsolete them.

transcription cofactor activity - GO:0003712, and child terms

The term "transcription cofactor activity" is defined quite specifically:

Definition: "The function that links a sequence-specific transcription factor to the core RNA polymerase II complex but does not bind DNA itself."

However, usage in the literature is broader than this and quite vague as to what is happening. The common elements seem to be:

  • binding to some protein component of the transcription machinery
  • NOT binding to DNA

I talked to a number of people about this. There was agreement that usage of the word "cofactor" is vague in the literature, but strong feeling that there are some types of cofactors where the functional components of the activity can described explicitly so that GO can represent what they do functionally, rather than in terms of process. Thus, I do think we need to attempt to represent functions for cofactors that are understood. However, it is likely that at this point in time, there will be things that are described as "transcription cofactors" in the literature, where it is not clear how it is acting at all and it may not be appropriate to make any function annotation at all based on current understanding.

As the current definition (above) is quite specific to a particular combination of binding. Amongst other things, the definitions of this term (and of its child terms "transcription coactivator activity" and "transcription corepressor activity" are all specific to RNA polymerase II, but should not be as the same terms are used for E. coli transcription. This is one type, but not actually broad enough to encompass all the various kinds, so we will probably need a general term, and then probably a few specific subtypes. At this time, we will only add subtypes where it is understood how they act, but more subtypes can be added later as understanding advances.

Proposal:

  1. changes to transcription cofactor activity - GO:0003712
    • name change to protein-binding transcription cofactor activity and additional parentage under protein binding
      - to make it clear that you need to be sure that this is a factor that acts by binding proteins in the transcription machinery
    • definition change
      - This term has been selected for the prokaryotic GO subset, so it appears that no one has noticed that the definition is specific for a polymerase that is found only in eukaryotes even though the name is not. Thus, I think that broadening the definition to match the term name is likely the appropriate course of action. The other option is to make the term name match the definition, though this has the potential to cause annotations made based on the name incorrect.
      - new definition - Interacting selectively and non-covalently with a protein component of the transcription machinery to regulate transcription.
    • make Function-Process link to regulation of transcription since it's parent term transcription activator activity should be merged into a Process term
  2. changes to transcription coactivator activity - GO:0003713
    • definition issue - The current definition is specific to RNAP II, but should be inclusive of prok's
    • We can probably change this in a way parallel to transcription cofactor activity with the additional specification that the effect is positive, so this term could be called something like protein-binding transcription cofactor activity involved in activation of transcription and have a relationship with the Process term positive regulation of transcription.
  3. changes to transcription corepressor activity - GO:0003714
    • all the same issues as for transcription coactivator activity - GO:0003713 but with a negative instead of positive effect.
  4. Eventually, I think we should make subtypes of cofactor activities representing specific functions. However, the literature is so confusing, I don't have suggestions yet. I do have a review (Thomas & Chiang, 2006) and a contact from the meeting who volunteered to provide further input.

NOTE: I've said "changes" in the proposal for these terms. However, whether we can just change them, or whether we need to obsolete them and create new terms with suggested replacements to consider may depend on the existing annotations and whether they would still be true if the term changes.

Note for future:

Based on feedback from the transcription meeting, I think we will want to create subtypes of cofactors depending on exactly how they act. However, the literature here is confusing in that the word "cofactor" is used in many different ways, not always with a clear definition of what is meant. Thus, I think that some but not all things described as a "cofactor" can be represented in the function ontology, with the corresponding implication that some things described in the literature as "cofactors" but without any clear indication of how they act will receive Process annotations under regulation of transcription, but not Function annotations.

I am not proposing any more specific terms at this time though because further research into this area looks to be time consuming and beyond the time constraint for this project.

RNA polymerase I, II, or III transcription factor activity, and child terms

Proposals:

  1. Merge these three terms into the corresponding process terms
    - These three terms merely indicate involvement in transcription by a specific RNA polymerase and are equivalent in meaning to process terms like transcription from RNA polymerase II promoter.
    - They are also grouping multiple kinds of function based purely on involvement in the same process.
    • RNA polymerase I transcription factor activity - GO:0003701
      def: "Functions to initiate or regulate RNA polymerase I transcription."
    • RNA polymerase II transcription factor activity - GO:0003702
      def: "Functions to initiate or regulate RNA polymerase II transcription."
    • RNA polymerase III transcription factor activity - GO:0003709
      def: "Functions to initiate or regulate RNA polymerase III transcription."
  2. Merge these three terms into the corresponding process terms
    - These terms are equivalent to existing process terms and are grouping on the basis of process, not on function.
    - Some specific functions can be represented when they can be described in terms of how they function, see section on transcription factor activity.
    - Note that general and nonspecific are two words for the same thing, and basal is a third which seems to be the preferred word now. Thus, there will be some consolidation of the process terms so that this is not represented twice.
    - The big problem in terms of function though is that basal transcription factors include things which bind DNA and also things which do not, but instead bind other transcription factors, so these terms process based groupings that include multiple different kinds of functions.
    • general RNA polymerase II transcription factor activity - GO:0016251
      def: "Any function that supports basal (unregulated) transcription of genes by core RNA polymerase II. Five general transcription factors are necessary and sufficient for such basal transcription in yeast: TFIIB, TFIID, TFIIE, TFIIF, TFIIH and TATA-binding protein (TBF)."
    • nonspecific RNA polymerase II transcription factor activity - GO:0016252
      def: "Any function that supports transcription of genes by RNA polymerase II, and is not specific to a particular gene or gene set."
    • specific RNA polymerase II transcription factor activity - GO:0003704
      def: "Functions to enable the transcription of specific, or specific sets, of genes by RNA polymerase II."
      - alternate for this term: Depending on annotations, an alternate proposal for this specific term would be to convert it to the term sequence-specific RNA polymerase II promotor regulatory transcription factor site binding proposed in the section on "transcription factor activity & promoter binding" in order to convert this into a term that specifically represents regulatory transcription factor binding to specific promoter sequences. However, if checking annotations reveals that the term has been used for basal transcription factors, e.g. TFIID, etc, as well, then I am not sure that it would be safe to convert the term as existing annotations might no longer be true. If that is the case, then perhaps people would want to obsolete this term instead of merging so that multiple alternates can be suggested.
  3. Merge this term with the corresponding complex term or obsolete so that multiple terms can be suggested.
    - This term seems to be specific for Mediator, so I think it is equivalent to the complex term for Mediator.
    - This term does not seem to be describing how the function is done, and with the name mediator in the name seems likely to be used specifically for Mediator.
    - I think that some combinations of binding activities can be represented as specific terms under the protein binding term.
    • RNA polymerase II transcription mediator activity - GO:0016455
      def: "Functions to mediate the interaction of transcriptional activators with the RNA polymerase II-general RNA polymerase II transcription factor complex."
  4. Rename and redefine terms already including binding in their defs
    - These two terms seem to be indicating specific binding as components of their function, so I think these can be retained, in the binding tree, with some possible modifications to the name and definition of the terms.
    • RNA polymerase II transcription factor activity, enhancer binding - GO:0003705
      def: "Functions to initiate or regulate RNA polymerase II transcription by binding an enhancer region of DNA."
    • ligand-regulated transcription factor activity - GO:0003706
      def: "Combining with a steroid hormone to initiate a change in cell activity."

transcription activator activity - GO:0016563, transcription repressor activity - GO:0016564, & child terms

  • transcription activator activity - GO:0016563
    def: "Any transcription regulator activity required for initiation or upregulation of transcription."
  • transcription repressor activity - GO:0016564
    def: "Any transcription regulator activity that prevents or downregulates transcription."
  • specific transcriptional repressor activity - GO:0016566
    def: "Any activity that stops or downregulates transcription of specific genes or sets of genes."
  • basal transcription repressor activity - GO:0017163
    def: "Any transcription regulator activity that prevents or downregulates basal transcription. Basal transcription results from transcription that is controlled by the minimal complement of proteins necessary to reconstitute transcription from a minimal promoter."
  • general transcriptional repressor activity - GO:0016565
    def: "Any activity that stops or downregulates transcription of genes globally, and is not specific to a particular gene or gene set."

Proposal (changes to and obsoletions of existing terms):

  1. obsolete transcription activator activity - GO:0016563, or merge into process term positive regulation of transcription (GO:0045941)
    • As currently defined this is equivalent to the process term positive regulation of transcription (GO:0045941).
    • There may be specific types of activators that can be described in terms of molecular activities, perhaps sequence-specific RNA polymerase II promotor transcription factor site binding involved in activation of transcription (see section on transcription factor activity. However, this term is defined largely on the basis of the effect on the process of transcription and not on the basis of how that effect is accomplished. In addition, since there are numerous ways that activators can function, this term is grouping different functions on the basis of process, so I don't think it is appropriate at a function. Any appropriate individual function terms can be created as types of binding, and then they can have relationships to positive regulation of transcription.
    • We may want to obsolete this term so that we can suggest multiple consider terms, the process term positive regulation of transcription (GO:0045941) as well as any function terms that can be created of the type suggested above.
  2. transcription repressor activity - GO:0016564 and children
    1. obsolete transcription repressor activity - GO:0016564, or merge into process term negative regulation of transcription (GO:0016481)
      • As currently defined this is equivalent to the process term negative regulation of transcription (GO:0016481)
      • I think this term can be dealt with similarly to transcription activator activity above
    2. obsolete specific transcriptional repressor activity - GO:0016566
      • As defined, this term is related to the Process term negative regulation of gene-specific transcription from RNA polymerase II promoter (GO:0010553) and is defined in terms of effect on the process of transcription, rather than how it is accomplished.
      • Similarly to transcription activator activity, this term should be obsoleted and any appropriate individual function terms can be created as types of binding, and then they can have relationships to negative regulation of transcription.
      • Merging might be difficult for this term since there no process term for specific transcription at a general level, only one for gene-specific transcription from RNA polymerase II promoter (GO:0032569)
    3. obsolete basal transcription repressor activity - GO:0017163 & general transcriptional repressor activity - GO:0016565
      • I group these two together because these two terms are identical to each other, as basal and general are two words for the same thing, at least for RNAP II. I'm aware of basal being used for E. coli RNAP where it means the same thing as for RNAP II, but have not encountered general being used in the prokaryotic context. Basal seems to be the preferred word now, over general, in the RNAP II literature, as it has become clear that the "general" factors aren't as general as once thought.
      • Similarly to other terms discussed in this section, e.g. transcription activator activity and specific transcriptional repressor activity, I think this term may represent multiple different types of binding functions, but that it is grouping based on the process, so I recommend obsoleting this term. As with others above, any appropriate individual function terms can be created as types of binding, and then they can have relationships to negative regulation of transcription.
      • Merging might also be difficult for these terms since there no process term for basal or general at a general level, only terms at the level of RNAP II transcription.

elongation regulator, termination factor, and antiterminator activity terms

  1. Elongation regulator terms
    • Terms
      • transcription elongation regulator activity - GO:0003711
        def: "Any activity that modulates the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule following transcription initiation."
      • negative transcription elongation factor activity - GO:0008148
        def: "Any activity that decreases the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule following transcription initiation."
      • positive transcription elongation factor activity - GO:0008159
        def: "Any activity that increases the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule following transcription initiation."
      • RNA polymerase I transcription elongation factor activity - GO:0016943
        def: "Any activity that modulates the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule catalyzed by RNA polymerase I following transcription initiation."
      • RNA polymerase II transcription elongation factor activity - GO:0016944
        def: "Any activity that modulates the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule catalyzed by RNA polymerase II following transcription initiation."
      • RNA polymerase III transcription elongation factor activity - GO:0016945
        def: "Any activity that modulates the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule catalyzed by RNA polymerase III following transcription initiation."
    • Proposal: Merge these terms with corresponding process terms
      - All of these terms are defined purely in terms of their effect on elongation and thus are completely equivalent to existing process terms.
      - These terms are likely to be grouping a number of different functions based on process rather than on functional similarity.
      - Where specific elongation factor activities can be described in terms of how they act, appropriate terms can be created. None are proposed as yet.
  2. Termination factor activity terms
    • Terms
      • transcription termination factor activity - GO:0003715
        def: "Any activity that brings about termination of transcription."
      • RNA polymerase I transcription termination factor activity - GO:0003716
        def: "Any activity that brings about termination of transcription by RNA polymerase I."
      • RNA polymerase II transcription termination factor activity - GO:0003717
        def: "Any activity that brings about termination of transcription by RNA polymerase II."
      • RNA polymerase III transcription termination factor activity - GO:0003718
        def: "Any activity that brings about termination of transcription by RNA polymerase III."
    • Proposal: Merge these terms with corresponding process terms
      - All of these terms are defined purely in terms of their effect on termination and thus are completely equivalent to existing process terms.
      - These terms are likely to be grouping a number of different functions based on process rather than on functional similarity.
      - Where specific termination factor activities can be described in terms of how they act, appropriate terms can be created. None are proposed as yet.
  3. Antiterminator activity term
    • Term
      • transcription antiterminator activity - GO:0030401
        def: "Functions to prevent the termination of RNA synthesis. Acts as a regulatory device, e.g. in phage lambda, enabling a terminator to be masked from RNA polymerase so that distal genes can be expressed."
    • Proposal: Rename and redefine this term & add some new subtypes
      - This term is named very broadly, and the first sentence of the definition is also very broad. However, the example in the second sentence is very specific. On the thought that this term has likely only been used for prokaryotic annotations (quick glimpse at AmiGO supports this idea), I propose to keep this term but adjust the name and definition to talk about how it acts, e.g. nucleic acid binding
      - There would be some additional child terms, depending on whether it binds a sequence in the nascent RNA or a sequence in the DNA, see Greenblatt J et al. Transcriptional antitermination. Nature. 1993 Jul 29;364(6436):401-6.

sigma factor related terms

  1. sigma factors => promotor specificity factors defined in terms of what they bind
    Terms
    1. transcription initiation factor activity - GO:0016986
      def:
    2. sigma factor activity - GO:0016987
      def: "A sigma factor is the promoter specificity subunit of eubacterial-type multisubunit RNA polymerases, those whose core subunit composition is often described as alpha(2)-beta-beta-prime. (This type of multisubunit RNA polymerase complex is known to be found in eubacteria and plant plastids). Although sigma does not bind DNA on its own, when combined with the core to form the holoenzyme, this binds specifically to promoter sequences, with the sigma factor making sequence specific contacts with the promoter elements. The sigma subunit is released from the elongating form of the polymerase and is thus free to act catalytically for multiple RNA polymerase core enzymes."
    3. mitochondrial transcription initiation factor activity - GO:0034246
      def: "A transcription factor activity that confers promoter specificity upon mitochondrial RNA polymerase, in a manner analogous to eubacterial sigma factors."#* The first transcription initiation factor activity is undefined but is the parent term of the other two terms (with no other child terms), so I want to consider these three terms together.
    • Proposal: obsolete transcription initiation factor activity - GO:0016986
      The first term transcription initiation factor activity should probably be obsoleted because the current name is much too broad and it seems that it might have been broadly used based on the name, which doesn't represent a specific function.
    • Proposal: rename and redefine sigma factor activity - GO:0016987, & merge or rename and redefine? mitochondrial transcription initiation factor activity - GO:0034246
      • The two child terms both represent essentially the same thing, a specificity factor that binds to core RNA polymerase (which is unable to recognize DNA on its own) and while bound to the RNAP confers sequence specific DNA binding. We could represent this as something like core RNA polymerase binding promoter specificity factor activity and give it parentage under both core RNA polymerase binding and under sequence-specific DNA binding. Possibly the existing sigma factor activity term could be redefined to be general, i.e. not specific to sigma factors. [This would be similar to the fact that we currently don't have a specific term for bacterial RNAP; annotation to the general term RNAP is sufficient since bacteria have only one RNAP complex.] While E. coli, and other bacteria, have many different sigma factors, they have only one RNAP, so the primary criterion distinguishing different sigma factors is which promoter sequence they recognize. We feel that GO should not get into the level of detail of describing specific binding sites, so 1 term for this type of activity is sufficient.
      • I am not sure if we need more than just one term for this. This kind of specificity factor exists for both prokaryotes, e.g. E. coli, and also for mitochondrial RNAP. However, I can't think of any examples of cells which have more than one kind of polymerase with this kind of specificity factor, so I think we may only need one term for this type of activity. I guess the argument in favor of keeping this term would be in order to make a link between it and existing process terms for mitochondrial transcription, though at the moment I think it would be an only child as I only know of this type of activity for prokaryotic and for mitochondrial transcription. If we do keep this individual term, then we'll probably want to change the name and def a bit to be consistent with the general term and the relationship with the mitochondrial transcription process term.
  2. sigma factor antagonists => redefine in terms of binding activities
    • Terms
      1. transcription initiation factor antagonist activity - GO:0016988
        def: "The function of binding to a transcription factor and stopping, preventing or reducing the rate of its transcriptional activity."
      2. sigma factor antagonist activity - GO:0016989
        def: "The function of binding to a sigma factor and stopping, preventing or reducing the rate of its transcriptional activity."
    • Proposal: obsolete transcription initiation factor antagonist activity - GO:0016988 & rename and redefine sigma factor antagonist activity - GO:0016989
      • These two terms are tied to the item above about sigma factor type specificity factors since these basically represent things that bind to sigma, or comparable, factors to prevent them from binding to core RNAP
      • transcription initiation factor antagonist activity isn't even defined correctly based on the name of the term in that it does not specify binding to an initiation factor. Since this term is badly named and defined, I recommend obsoleting it.
      • The decision of whether we need only a single general term or specific terms that affect regulation of specific types of polymerases should follow the decision made for sigma type specificity factors.
        1. The term sigma factor antagonist activity should probably be renamed and defined to become a general term for binding to a promoter specificity factor of the sigma type so that it will be broad enough to represent this activity generally and still represent what is known for prokaryotic transcription.
        2. I do not know if such an activity is characterized in mitochondrial transcription, so even if we create a specific term for the mitochondrial promoter specificity factor, I don't yet see a need to create a mitochondrial specific term here.
  3. proteins that bind anti-sigma factors and similar
    • Term(s)
      1. anti-sigma factor antagonist activity - GO:0043856
        def: "The function of binding to an anti-sigma factor and stopping, preventing or reducing the rate of its activity."
    • Proposal: rename and redefine anti-sigma factor antagonist activity - GO:0043856
      This can be renamed and redefined so that it can be general for anything that binds to a protein that binds to a protein which binds to a specificity factor to prevent it from binding to RNAP core.

transcription factor binding terms

Currently, most of child terms of "transcription factor binding" (except "transcription cofactor activity") and the children of "transcription coactivator activity" represent binding to specific proteins. We feel that this is too specific to represent with individual GO terms because there are thousands and thousands of specific transcription factors. Groups who wish to curate this level of detail should utilize column 16 to indicate what is being bound.

We would like to represent binding just to classes of transcription factors, such as basal or regulatory transcription factors.

Proposal: new terms

I am showing only a sample set of new terms, some general terms relevant to both prokaryotes and eukaryotes and some terms specific for RNA polymerase II. Where appropriate there will be links to Process terms.

- transcription factor binding - GO:0008134
-- basal transcription factor binding
--- basal RNA polymerase II transcription factor binding
-- regulatory transcription factor binding
--- regulatory RNA polymerase II transcription factor binding
---- regulatory RNA polymerase II transcription factor binding involved in activation of transcription (link to Process)
---- regulatory RNA polymerase II transcription factor binding involved in repression of transcription (link to Process)

Also see Function (red) terms under transcription factor binding in this diagram, which also shows links to Process (blue) terms.

Proposal: obsoletions

Specific terms proposed to obsolete:

terms under transcription factor binding

  • aryl hydrocarbon receptor binding - GO:0017162
  • bHLH transcription factor binding - GO:0043425
  • MRF binding - GO:0043426
  • Mrf4 binding - GO:0051578
  • Myf5 binding - GO:0051576
  • MyoD binding - GO:0051577
  • myogenin binding - GO:0051579
  • NF-kappaB binding - GO:0051059
  • NFAT protein binding - GO:0051525
  • NFAT1 protein binding - GO:0051526
  • NFAT2 protein binding - GO:0051527
  • NFAT3 protein binding - GO:0051528
  • NFAT4 protein binding - GO:0051529
  • NFAT5 protein binding - GO:0051530
  • retinoic acid receptor binding - GO:0042974
  • retinoid X receptor binding - GO:0046965
  • Tat protein binding - GO:0030957
  • thyroid hormone receptor binding - GO:0046966

terms under transcription cofactor activity

  • cAMP response element binding protein binding - GO:0008140
  • ligand-dependent nuclear receptor transcription coactivator activity - GO:0030374
  • thyroid hormone receptor coactivator activity - GO:0030375

RNA polymerase binding terms

We currently have these two terms:

  • RNA polymerase binding - GO:0070063
    Def: Interacting selectively and non-covalently with an RNA polymerase.
  • RNA polymerase core enzyme binding - GO:0043175
    Def: Interacting selectively and non-covalently with the prokaryotic RNA polymerase core enzyme, the part of the RNA polymerase consisting of two alpha, one beta and one beta prime subunits.

The RNA polymerase core enzyme binding term has existed for quite some time. Although it has a very general name, the definition is quite specific for the prokaryotic type of RNA polymerase core enzyme, though the term core enzyme is also used for the multisubunit eukaryotic nuclear RNAP's I, II, and III also.

The general term RNA polymerase binding was added in response to a SF item about RNA polymerase binding. There has also been a request to add new MF terms for eukaryotic RNA polymerase binding.

I think there is good reason to add more terms for various types of core RNA polymerase binding terms. However, I am very hesitant about adding holo enzyme terms for the eukaryotic enzymes because:

  1. The compositions are not well characterized. For the case of RNAP II, multiple different "holoenzymes" have been described in the literature with different compositions. They do not each have individual names, but are all just called the RNAP II holoenzyme. The single existing component term for RNAP II holoenzyme does not reflect the complexity of the situation, see issues with: DNA-directed RNA polymerase II, holoenzyme
  2. While it is true that "holoenzymes" can be purified for eukaryotic nuclear RNAPs, it is not completely clear that they are meaningful biologically.

Proposal

  1. Redefine: RNA polymerase core enzyme binding - GO:0043175
    - While the definition is clearly spelled out in terms of the composition of the prokaryotic enzyme, the existing annotations in AmiGO shows that annotators have clearly used the term name rather than the definition:
    • ercc8 - WD40 repeat-containing protein, DNA excision repair protein 8 (gene from Dictyostelium discoideum)
    • PSPPH_0861- RNA polymerase-binding protein DksA (protein from Pseudomonas syringae pv. phaseolicola 1448A)
    • RBM16 - RNA-binding protein 16 (protein from Homo sapiens)
    • SPAC23A1.16c - RNA polymerase II-associated protein Rtr1 (gene from Schizosaccharomyces pombe)
    - Since the term has clearly been annotated based on the name, we propose to broaden the definition so that this term will be come a general term for binding to a core RNA polymerase enzyme.
  2. New terms: for binding to types of core RNAP enzymes
    -- We are thinking that there is good reason to create types for core binding for each type of RNA polymerase that can exist in a single cell because we will be making function terms that incorporate binding to a specific type of core RNAP. Plants have the largest number of different types, with additional nuclear RNAPs and chloroplast RNAPs in addition to the standard three nuclear RNAPs and mitochondrial RNAP found in all eukaryotes.
    - I am not sure that we need to generate a new term that is specific for prokaryotic RNAP. We don't have a specific component term for prok RNAP either because the general term is sufficient since there is only one RNAP in prokaryotes. Thus based on the existing SF item, we are thinking about these new terms:
-enzyme binding
--RNA polymerase binding (GO:0070063)
---RNA polymerase core enzyme binding (GO:0043175)
----RNA polymerase I core enzyme binding (GO:new)
----RNA polymerase II core enzyme binding (GO:new)
----RNA polymerase III core enzyme binding (GO:new)
----RNA polymerase IV core enzyme binding (GO:new)
-----RNA polymerase IVa core enzyme binding (GO:new)
-----RNA polymerase IVb core enzyme binding (GO:new)
----plastid-encoded plastid RNA polymerase complex core enzyme binding (GO:new)
-----plastid-encoded plastid RNA polymerase complex A core enzyme binding (GO:new)
-----plastid-encoded plastid RNA polymerase complex B core enzyme binding (GO:new)

Biological Process

Clarification of what is part of the transcription cycle (initiation, elongation, termination)

There have been SF items asking for clarification of what is, and is not, part of transcription, and especially transcription initiation specifically. There have also been items asking for representation of additional parts of the transcription cycle. These two issues are connected and by representing the various steps and defining them precisely, I think it will be clearer which term is appropriate and what is actually included in each step.

These are relevant SF items:

Proposal

  1. new Process terms
    - additional terms for the transcription cycle with more specific definitions to indicate exactly what is included
    - transcription from RNA polymerase II promoter will get an additional child term in Process to represent the promoter clearance transition from initiation to elongation (see diagram)
    - transcriptional initiation from RNA polymerase II promoter will get additional child terms in Process to represent defined steps within initiation (see diagram)
    - These same steps in the transcription cycle are also observed in E. coli RNAP, where much of the basic work defining the transcription cycle was done. There will be additional higher level terms, e.g. just promoter clearance or just transcriptional open complex formation that will be appropriate for prokaryotic annotation (not shown in diagram).
    - For other specific RNA polymerases, new parallel terms will be generated as appropriate (not shown in diagram).
  2. new Function terms
    - new terms representing binding to RNA polymerase or to transcription factors as part of preinitiation complex (PIC) assembly (see diagram)
    - There are other function-process links that can be made. For example, open complex formation is an ATP-dependent step for some preinitiation complexes (depending on which polymerase, sigma factor, ...) so we probably can make a function term for ATPase activity involved in transcription open complex formation
  3. making links between Function and Process
    - With our new philosophy that Function terms should indicate how while Process terms can indicate what, we will no longer be making Function terms that group by Process. This type of grouping will now be done by making links from Function terms to Process terms. (see diagram)
    - Function-Process links are generally part_of or has_part

Diagram
A picture is worth a thousand words, but PLEASE NOTE that this diagram should be considered conceptual rather than the exact details. This is from my test version of the ontology that is not complete and I have left out some of the specifics to get clarity in the diagram. For example, in the Function section, I discussed representing core promoter elements versus binding sites for regulatory transcription factors; this is not represented here.

  • shows sample new terms in both Function and Process
  • shows sample Function-Process links

Clarification of "general/non-specific/basal" vs "specific" transcription

There have been SF items requesting terms for general/non-specific/basal" vs "specific" transcription and clarification of exactly what is meant, like this one: general and specific transcription - ID: 1590000. It has also been suggested to broaden the general vs specific distinction so that it applies to all transcription in order to be able to use these terms to be able to distinguish the genes that are involved in specific vs general transcription.

Currently there are terms for general and gene-specific transcription from RNA polymerase II promoter, but not for any other types of RNA polymerase promoters. These are the existing terms:

  • gene-specific transcription from RNA polymerase II promoter - GO:0032569
    Def: The specifically regulated synthesis of RNA from DNA encoding a specific gene or set of genes by RNA polymerase II (Pol II), originating at a Pol II-specific promoter. In addition to RNA polymerase II and the general transcription factors, specific transcription requires one or more specific factors that bind to specific DNA sequences or interact with the general transcription machinery.
  • general transcription from RNA polymerase II promoter - GO:0032568
    Def: The basal, non-specifically regulated synthesis of RNA from a DNA template by RNA polymerase II (Pol II), originating at a Pol II-specific promoter. Mediated by core RNA polymerase II and a set of general transcription factors; in Saccharomyces five transcription factors are necessary and sufficient for such basal transcription.

Issues

  1. The current definitions reflect outdating thinking
    - Early on it was thought that the RNAP II "general transcription factors", or GTFs were involved in preinitation complex (PIC) formation at ALL RNAP II promoters. However, it is now clear that there is no single type of RNAP II promoter. Thus there are probably many types of PICs and the "general" factors may not be general. Thus Sikorsky & Buratowski (2009) use the term basal rather than general.
    - The current definition for general says "non-specifically regulated". This is probably not really the case. Transcription of the set of genes that requires only basal transcription factors is regulated, they just don't require additional non-basal factors to recognize the promoter. It is also a quite large set of genes, larger than the typical number of genes that is responsive to a particular activator, and they are active in "standard" growth conditions.
  2. term names also reflect outdating thinking
    - As mentioned above, Sikorsky & Buratowski (2009) use the term basal rather than general because it appears that the "general" factors aren't as general as once thought.
    - There may also be a better name for gene-specific. In a comprehensive review of eukaryotic general transcription machinery, Thomas and Chiang (2006) use the phrase activator-dependent. I think this is a more accurate phrase, and also consistent with the phrase that is used in E. coli for the same situation.
  3. position of the terms
    - The current terms general and specific RNAP II transcription are siblings of the terms for initiation, elongation, and termination of RNAP II transcription (see diagram). Thus if a gene is annotated to the term for gene-specific transcription from RNA polymerase II promoter, you do not get any connection to what part of the transcription cycle (initiation, elongation, or termination) it is involved in.
    - As far as I can tell, basal vs gene-specific indicate whether the PIC can form with only basal factors or require an "activator", a regulatory transcription factor that binds a specific promoter sequence, to recognize the promoter and recruit basal factors to the promoter.
  4. gene sets involved in basal vs specific transcription
    - It is not the case that there is one set of genes involved in basal transcription and another set involved in specific transcription.
    - Basal refers to transcription that can occur only with basal transcription (initiation) factors.
    - "Gene-specific" means that the promoter sequence is not recognized directly by the basal factors, but instead by a "gene-specific" regulatory transcription factor, which then recruits a basal transcription factor.
    - Thus, as far as I know, everything involved in basal transcription is also involved in gene-specific transcription.
  5. generality of general vs gene-specific (or basal vs activator-dependent)
    - Eukaryotic transcription by RNA polymerase II and prokaryotic transcription in E. coli share some similarities in this area, and the basal vs activator-dependent terminology has been used in E. coli for a long time, where it means the same thing as I have outlined above for RNAP II. Thus there is clearly some need to represent basal vs activator-dependent transcription at a general level so that prokaryotic annotators can make this distinction also.
    - It is not clear that this is a meaningful distinction for the eukaryotic RNA polymerases I or III. Particularly for RNAP I, which (for most organisms) only makes a single type of transcript (the large rRNA) from a single type of promoter, there isn't really any gene-specific regulation going on.

Useful reviews

Proposal

  1. Rename the two existing terms and move them to be subtypes of initiation (see diagram showing proposed term changes; existing terms are circled with arrows pointing to proposed new locations of renamed terms)
  2. Redefine existing terms to reflect current understanding
  3. Create additional higher level terms appropriate for use for prokaryotic annotation (not shown on diagram)
  4. At this time, I see no need for basal and activator-dependent terms for RNAP I or III, though the structure is such that it will be possible to add them for any polymerase/promoter type for which it is appropriate.

"Txn from promoter types" vs "transcription of classes of RNAs"

We were considering whether or not it made sense to have terms based on the type of RNA produced, e.g. rRNA transcription, since the process of transcription is really more a factor of what type of promoter, i.e. which RNA polymerase does the transcription, than which class of RNA. We were wondering if it made more sense to only have terms representing the promoter class from which the transcription originated, e.g. transcription from RNA polymerase III type 1 promoter. While there was some agreement with this idea at the transcription meeting, I have seen some papers talking about regulation of transcription of certain classes of RNAs since coming back. Thus, currently, I think the conservative approach is to add some additional terms representing distinct, well-characterized promoter classes WITHOUT removing any of the RNA class based terms.

Top level RNA class (e.g. snoRNA) based terms that currently exist:

  • mRNA transcription
  • rRNA transcription
  • tRNA transcription
  • snRNA transcription
  • snoRNA transcription

Proposal

  1. leave existing RNA class (e.g. snoRNA) based terms as is
  2. changes to existing term 5S class rRNA transcription
    - rename to use standard nomenclature for RNA polymerase III promoter type: transription from RNA polymerase III type 1 promoter
    - existing name would become synonym
    - update definition to provide additional information about the promoter type
  3. add new terms
  4. for RNA polymerase III promoter types not currently represented
    - type 1
    - type 3

RNA elongation - GO:0006354

The standard phrase in the literature is transcription elongation, not RNA elongation.

Proposal: Term name change

  1. We propose to swap the current name RNA elongation with one of the existing synonyms transcription elongation to go with prodominant usage in the literature and to be parallel with the names of the other stages of transcription transcription initiation and transcription termination for this term and all child terms that use this nomenclature.

Additional comments

merging vs obsoleting (with terms suggested for reannotation)

There are a number of places where we have suggested merging Function terms into Process terms that represent the same thing. Another option for dealing with terms like this, at least for some of them, would be to obsolete the term, rather than merge it, so that we can provide multiple consider options to suggest possible terms. This might be appropriate in cases where the existing Function term is grouping multiple different kinds of functions based on involvement in a similar process because in addition to the Process term that is equivalent to the Function term, there may also be multiple new Function terms to be created that annotators might want to consider in reannotation. Whichever option is preferred by annotators is fine.

Key to ontology diagrams

  • Terms:
    • red box = Function term
    • blue box = Process term
  • Links:
    • blue line = is_a relationship
    • yellow line = part_of relationship
    • green line = has_part relationship