Proposals to overhaul transcription in GO - 2010

From GO Wiki
Jump to navigation Jump to search

Philosophy of Overhaul

Over the last few years, GO has changed how we talk about and create Function terms. We are have been moving towards an approach so that they now represent how something occurs, e.g. binding, catalytic activities, etc. In this process, we have begun to avoid creating Function terms, and also to eliminate existing ones, that duplicate process terms, that is functions that do not describe how the gene product acts but only specify that it is involved in a process. Despite the fact that we have always said that Function and Process should represent non-overlapping aspects, we have many older terms in Function that essentially duplicate a Process term. Compare, for example, the Function term transcription regulator activity with the Process term regulation of transcription. Both terms essentially mean the same thing. In addition, the Function term transcription regulator activity is not grouping the terms below it on the basis of having similar functions, but rather on the basis of being involved in the same process. This lack of clarity in the distinction between Function and Process generates confusion, both for annotators and for users. One researcher at the Gene Transcription in Yeast meeting told me that she only uses GO occasionally and she can never remember whether the term she wants is in Function or Process.

One of the major goals of this overhaul is to generate clarity between the function terms and the process terms for transcription. We are proposing to eliminate some Function terms that are equivalent to Process terms and which cannot be converted into a description of the molecular activity, or activities, involved. In other cases, we are proposing changes to Function terms so that they actually describe molecular activities. With the recently developed method of creating links between Function and Process terms, the old motivations to have terms like transcription regulator activity in order to be able to group functions by process should be addressed anyway, since terms representing functions involved in regulation of transcription will have a relationship to that Process term, or to a more specific child term as appropriate.

With respect to annotation, these changes will mean that in cases where the experiments indicate that a gene product is involved in regulating transcription, but give no indication as to how it acts, it would be appropriate to annotate only with a specific Process term but only with the root Function term.

Molecular Function

transcription factor activity - GO:0003700 & promoter binding - GO:0010843

Proposal

The term "transcription factor activity" has parentage under "DNA binding", rather than under "sequence-specific DNA-binding", even though it is defined as binding to a specific DNA sequence. The majority of things in this class are "regulatory" transcription factors which bind a specific DNA sequence present in a relatively limited set of promoters. There are also basal transcription factors which bind to specific core promoter elements in a sequence specific way.

However, we also need to account for the fact that not all basal txn factors bind DNA. Currently, we also have terms like 'RNA polymerase II transcription factor activity" that do not have parentage under "DNA binding" and which really mean anything involved in regulating RNAP II transcription, i.e. basically a process definition. This problem has been reported in SourceForge multiple times because the current structure makes it impossible to indicate that a transcription factor is a sequence-specific DNA binding factor for a specific polymerase; you can either indicate that it binds DNA, or that it is a factor for RNAP I, II, or III, but not that it binds a specific sequence for a specific RNAP.

We would like to have function terms that indicate what type of DNA sequence element is being bound, e.g. a basal promoter element versus the binding site for a regulatory transcription factor, such as Gal4 in yeast. Additionally, we would like to be able to indicate binding to enhancer sites. So the distinction in function will be by the type of DNA site bound.

Proposal: With that in mind, here is a proposed structure (where no number is indicated, the term would be new) and some specific changes proposed for some of the existing terms.

Changes to existing Function terms:

  1. transcription factor activity - GO:0003700
    • current def: The function of binding to a specific DNA sequence in order to modulate transcription. The transcription factor may or may not also interact selectively with a protein or macromolecular complex.
    • Change name to "sequence specific DNA-binding transcription factor activity", as shown in structure below, or similar
    • Change position to reflect current definition that this indicates sequence specific binding to DNA. Currently, this term is directly under DNA-binding, but the definition specifies a specific sequence, so we propose to move it to be a direct child of sequence-specific DNA binding (GO:0043565).
  2. promoter binding - GO:0010843 Changed, see below
    • current def: Interacting selectively and non-covalently with the regulatory region composed of the transcription start site and binding sites for transcription factor complexes of the basal transcription machinery.
    • Change either name or definition. Currently this term is defined too narrowly such that it only includes the core promoter elements, while the binding sites for regulatory transcription factors are also considered to be promoter elements. In addition, the child terms are not all core promoter elements. We recommend changing the definition to match the name. The other possibility is changing the name to core promoter binding so that it matches the current definition, but if people have annotated based on the broader term name, annotations will become incorrect with this option.
    • The def should also avoid specifying that the binding sites are for complexes; not all transcription factors are.


Originally proposed Structure showing types of new terms and relationship to existing Function terms:

- DNA binding - GO:0003677
-- (i) sequence-specific DNA-binding - GO:0043565
--- (i) sequence-specific DNA-binding transcription factor activity - GO:0003700
---- (i) sequence-specific promoter binding - GO:0010843 or GO:new
----- (i) sequence-specific core promoter binding  - GO:0010843 or GO:new
------ (i) sequence-specific RNA polymerase I core promoter binding
------ (i) sequence-specific RNA polymerase II core promoter binding
------- (i) sequence-specific RNA polymerase II core promoter binding involved in preinitiation complex formation (link to Process)
------- (i) sequence-specific RNA polymerase II core promoter binding involved in negative regulation of preinitiation complex formation (link to Process)
------ (i) sequence-specific RNA polymerase III core promoter binding
----- (i) sequence-specific regulatory transcription factor site binding
------ (i) sequence-specific promotor regulatory transcription factor site binding
------- (i) sequence-specific RNA polymerase I promotor regulatory transcription factor site binding
------- (i) sequence-specific RNA polymerase II promotor regulatory transcription factor site binding
------- (i) sequence-specific RNA polymerase III promotor regulatory transcription factor site binding
---- (i) specific enhancer transcription factor site binding

Note that there will also be links to Process terms, e.g. a Function term like sequence-specific RNA polymerase II promotor transcription factor site binding will have a relationship to a appropriate Process term about gene-specific transcription from RNA polymerase II promoter.

Modifications to the Proposal:

Discussion brought up two issues that require some modifications to the original proposal.

  1. How to define what is part of a promoter
    Jim Hu commented that "In the prokaryotic literature, the promoter does not include the TF binding elements." In contrast, the majority of RNA polymerase II literature does seem to include sequence-specific regulatory transcription factor binding sites, e.g. those for factors like Gal4, as part of the promoter. However, David just attended a talk where a eukaryotic researcher used a definition of promoter comparable to that used for prokaryotes, so there appears to be a dichotomy in the definition of a promoter between those who include TF binding sites as part of the promoter and those who do not; this dichotomy does not seem to represent an actual functional difference.
  2. Relationships for the function of a sequence-specific DNA-binding transcription factor activity
    Ruth Lovering indicated that during the GO Annotation Camp in Geneva, a decision was made about the use of the has_part relationship for functions that include binding as part of their function. Since most transcription factors act by binding DNA and also to some protein component of the transcription machinery, we are reworking the relationships to take this decision into account. For a detailed discussion of using has_part to describe complex functions, see David's poster from St. Croix.

Our plan to proceed:

  1. How to define what is part of a promoter
    - We have effectively decided to punt on defining what constitutes a promoter. The exact meaning of the word "promoter", in terms of exactly which portions of the regulatory region constitute the "promoter" versus additional regulatory elements is not consistent across all the various RNA polymerases, nor even just within the RNA polymerase II literature alone.
    - Thus, for the "promoter binding" type terms, we are planning to use the general phrase "transcription regulatory region DNA-binding" as the general term grouping binding to any of the regulatory elements controlling transcription of a region.
    - Then for specific types of promoters, e.g. bacterial-type or RNA polymerase II, we can have specific terms that do incorporate the word "promoter" in cases where we can clearly define which types of DNA sites constitute the promoter.
  2. Use of has_part relationships to describe the binding functions of transcription factor activity terms
    - We have been working through using the has_part relationship. An example that is simple to explain, and also allows me to show the promoter issue in the same diagram, is sigma factor activity. Basically sigma factors are bacterial transcription factors that bind to core RNA polymerase to form holo RNA polymerase. Then, as part of the holoenzyme, sigma makes sequence specific contacts with DNA and confers on holoenzyme the ability to recognize promoter sequences. The mitochondrial RNA polymerase has a similar type of promoter specificity factor.
    - So, we have created a general term for RNA polymerase binding specificity activity, which is a type of RNA polymerase binding transcription factor activity. Under this term, there are two specific terms for these activities for the bacterial and mitochondrial enzymes. Each of these terms has has_part relationships to indicate binding to the appropriate type of RNA polymerase and to the appropriate type of promoter sequence.
    - For the general term sequence-specific DNA binding transcription factor activity (GO:0003700), it will have a has_part relationship to the term transcription regulatory region sequence-specific DNA binding, but will not have any has_part relationships to any form of protein binding. There are two reasons not to make any relationship to any form of protein binding for the term 3700:
    • There are understood cases such as the bacterial Integration Host Factor (IHF) which binds DNA and bends it, thus facilitating interactions between other proteins bound at sites brought into proximity with each other by the IHF induced bend, thus facilitating assembly of the preinitiation complex. However, the function of IHF in transcription does not appear to involve binding to other components of the transcription machinery.
    • While there are some cases where we understand what portion of the transcription machinery a transcription factor is contacting in order to carry out its role, there are many where we do not. Thus, we need to be able to capture the fact that a protein has sequence specific DNA binding as part of its function, even if we don't know the other components. We can add additional child terms, many of which will also be children of protein binding transcription factor activity to capture functions where we understand both the DNA binding and protein binding interactions that constitute the role of that transcription factor.

Diagram of sigma factor activity showing both the representation of bacterial promoters and the use of has_part relationships to indicate the component functional parts that constitute sigma factor activity.

Summary of Proposed changes:

  1. transcription factor activity - GO:0003700
    • current def: The function of binding to a specific DNA sequence in order to modulate transcription. The transcription factor may or may not also interact selectively with a protein or macromolecular complex.
    • Change name to "sequence specific DNA-binding transcription factor activity", as shown in structure below, or similar
    • Change position to reflect current definition that this indicates sequence specific binding to DNA. Currently, this term is directly under DNA-binding, but the definition specifies a specific sequence, so we propose to move it to be a direct child of sequence-specific DNA binding (GO:0043565).
  2. Question relating to: the term DNA regulatory region binding (GO:0044212)
    • Is it true that all regulatory regions affect gene expression? I might have thought that DNA regions could regulate other things.
    • If it is true that DNA regulatory regions can regulate things other than gene expression, then I think we need to change the definition so that this term is for any DNA regulatory region, regardless of whether it affects transcription or something else, and add a new term for regulatory regions that affect transcription, called something like transcription regulatory region DNA binding.
    • Annotation comment:
      The handful of genes annotated directly to this term, as viewed by AmiGO, are all over the place and should probably be annotated to more specific terms, regardless of whether there is any change to this term. If people prefer, we could obsolete this existing term and suggest replacement terms.
  3. obsolete: the term promoter binding (GO:0010843)
    Reason: The word promoter is used in different ways such that it is not possible to define it consistently. In RNA polymerase II literature, most though not all researchers seem to include transcription factor binding sites; in prokaryotes, the promoter is limited to what is recognized by basal factors and thus is comparable to the RNA pol II core promoter in eukaryotes.
    Annotation impact:
    • Replacement terms will include sequence-specific basal promoter binding, sequence-specific regulatory transcription factor site binding, and children of each.
    • A glance at AmiGO suggests that much of what is annotated to the term promoter binding are sequence specific transcription factors where the appropriate annotation will be to sequence-specific regulatory transcription factor site binding or to a more specific child term.
  4. obsolete: these terms representing binding to specific DNA motifs
    All of these represent binding to specific sequence motifs. It is beyond the scope of GO to capture the thousands of individual sequence motifs that exist. We are working with Karen Eilbeck of SO to make sure that GO and SO are in synch with respect to how promoter motifs are defined and even SO is planning only to represent general classes of motifs, not every specific one. Thus, we feel that the appropriate level of detail for GO to capture for promoters is whether it is a basal/core promoter element (generally done by a basal transcription factor), a binding site for a regulatory transcription factor, or an enhancer binding site (also bound by regulatory transcription factors). Thus we would like to indicate the types of binding to general types of motifs as indicated in the structure above. These existing terms that indicate very specific motifs, we feel should be obsoleted.
    • all current children of promoter binding (GO:0010843)
      Annotation impact: Replacement terms may include sequence-specific basal promoter binding, sequence-specific regulatory transcription factor site binding, and/or children of each.
    • cAMP response element binding - GO:0035497
      def: "Interacting selectively and non-covalently with the cyclic AMP response element (CRE), a short palindrome-containing sequence found in the promoters of genes whose expression is regulated in response to cyclic AMP."
    • carbohydrate response element binding - GO:0035538
      def: "Interacting selectively and non-covalently with the carbohydrate response element (ChoRE) found in the promoters of genes whose expression is regulated in response to carbohydrates, such as the triglyceride synthesis genes."
    • E-box binding - GO:0070888
      def: "Interacting selectively and non-covalently with an E-box, a DNA motif with the consensus sequence CANNTG that is found in the promoters of a wide array of genes in neurons, muscle and other tissues."
    • estrogen response element binding - GO:0034056
      def: "Interacting selectively and non-covalently with the estrogen response element (ERE), a conserved sequence found in the promoters of genes whose expression is regulated in response to estrogen."
    • juvenile hormone response element binding - GO:0070594
      def: "Interacting selectively and non-covalently with the juvenile hormone response element (JHRE), a conserved sequence found in the promoters of genes whose expression is regulated in response to juvenile hormone."
    • mitochondrial heavy strand promoter anti-sense binding - GO:0070362
      def: "Interacting selectively and non-covalently with the anti-sense strand of the heavy strand promoter, a promoter located on the heavy, or guanine-rich, strand of mitochondrial DNA."
    • mitochondrial heavy strand promoter sense binding - GO:0070364
      def: "Interacting selectively and non-covalently with the sense strand of the heavy strand promoter, a promoter located on the heavy, or guanine-rich, strand of mitochondrial DNA."
    • mitochondrial light strand promoter anti-sense binding - GO:0070361
      def: "Interacting selectively and non-covalently with the anti-sense strand of the light strand promoter, a promoter located on the light, or cytosine-rich, strand of mitochondrial DNA."
    • mitochondrial light strand promoter sense binding - GO:0070363
      def: "Interacting selectively and non-covalently with the sense strand of the light strand promoter, a promoter located on the light, or cytosine-rich, strand of mitochondrial DNA."
    • serum response element binding - GO:0010736
      def: "Interacting selectively and non-covalently with the serum response element (SRE), a short sequence with dyad symmetry found in the promoters of some of the cellular immediate-early genes, regulated by serum."
    • sterol response element binding - GO:0032810
      def: "Interacting selectively and non-covalently with the sterol response element (SRE), a nonpalindromic sequence found in the promoters of genes involved in lipid metabolism."
    • vitamin D response element binding - GO:0070644
      def: "Interacting selectively and non-covalently with the vitamin D response element (VDRE), a short sequence with dyad symmetry found in the promoters of some of the cellular immediate-early genes, regulated by serum."
    • this child term of DNA regulatory region binding (GO:0044212)
      Annotation impact: A replacement term will be suggested
    • purine-rich negative regulatory element binding (GO:0032422)
      def: Interacting selectively and non-covalently with a 30-bp purine-rich negative regulatory element; the best characterized such element is found in the first intronic region of the rat cardiac alpha-myosin heavy chain gene, and contains two palindromic high-affinity Ets-binding sites (CTTCCCTGGAAG). The presence of this element restricts expression of the gene containing it to cardiac myocytes.

Actions taken

  1. transcription factor activity - GO:0003700
    • - done: Term name changed and position modified as proposed above.
  2. Question relating to: the term DNA regulatory region binding (GO:0044212)
    • - done: Based on the existing children of GO:0044212, we felt it was already specific to transcription, so we changed this term name to indicate that.
    • - done: We created a new more general term for any DNA regulatory region binding (GO:0000975). Note that this is the reverse of the way we originally proposed to treat the existing and new terms.
  3. obsolete: the term promoter binding (GO:0010843)
  4. obsolete: these terms representing binding to specific DNA motifs

transcription regulator activity - GO:0030528

The term transcription regulator activity (GO:0030528) is the highest level Function term for transcription and it is essentially identical to a Process term. It conveys exactly the same information as the Process term regulation of transcription and it does NOT convey any information about the molecular nature of the regulator activity. In addition, it is grouping the child terms below it based on involvement in a common Process, not based on having a common Function and it currently includes functions based on binding DNA and functions based on interacting with other proteins.

MF: transcription regulator activity - GO:0030528
Current definition: Plays a role in regulating transcription; may bind a promoter or enhancer DNA sequence or interact with a DNA-binding transcription factor.

BP: regulation of transcription - GO:0045449
Current definition: Any process that modulates the frequency, rate or extent of the synthesis of either RNA on a template of DNA or DNA on a template of RNA.

Proposal (changes to and obsoletions of existing terms):

  1. We propose to merge this Function term (GO:0030528) into the equivalent Process term (GO:0045449). [There is precedent for this type of merge with the merge of the Function term splicing factor activity into the equivalent Process term.]
  2. There are three regulation terms in Process based on this term which would need to be dealt with.
    • One option would be to also merge them into the Process terms like regulation of transcription and the appropriate positive and negative regulation terms. Alternately, we could obsolete them.

transcription cofactor activity - GO:0003712, and child terms

The term "transcription cofactor activity" is defined quite specifically:

Definition: "The function that links a sequence-specific transcription factor to the core RNA polymerase II complex but does not bind DNA itself."

However, usage in the literature is broader than this and quite vague as to what is happening. The common elements seem to be:

  • binding to some protein component of the transcription machinery
  • NOT binding to DNA

I talked to a number of people about this. There was agreement that usage of the word "cofactor" is vague in the literature, but strong feeling that there are some types of cofactors where the functional components of the activity can described explicitly so that GO can represent what they do functionally, rather than in terms of process. Thus, I do think we need to attempt to represent functions for cofactors that are understood. However, it is likely that at this point in time, there will be things that are described as "transcription cofactors" in the literature, where it is not clear how it is acting at all and it may not be appropriate to make any function annotation at all based on current understanding.

As the current definition (above) is quite specific to a particular combination of binding. Amongst other things, the definitions of this term (and of its child terms "transcription coactivator activity" and "transcription corepressor activity" are all specific to RNA polymerase II, but should not be as the same terms are used for E. coli transcription. This is one type, but not actually broad enough to encompass all the various kinds, so we will probably need a general term, and then probably a few specific subtypes. At this time, we will only add subtypes where it is understood how they act, but more subtypes can be added later as understanding advances.

Proposal:

  1. changes to transcription cofactor activity - GO:0003712
    • name change to protein-binding transcription cofactor activity and additional parentage under protein binding
      - to make it clear that you need to be sure that this is a factor that acts by binding proteins in the transcription machinery
    • definition change
      - This term has been selected for the prokaryotic GO subset, so it appears that no one has noticed that the definition is specific for a polymerase that is found only in eukaryotes even though the name is not. Thus, I think that broadening the definition to match the term name is likely the appropriate course of action. The other option is to make the term name match the definition, though this has the potential to cause annotations made based on the name incorrect.
      - new definition - Interacting selectively and non-covalently with a protein component of the transcription machinery to regulate transcription.
    • make Function-Process link to regulation of transcription since it's parent term transcription activator activity should be merged into a Process term
  2. changes to transcription coactivator activity - GO:0003713
    • definition issue - The current definition is specific to RNAP II, but should be inclusive of prok's
    • We can probably change this in a way parallel to transcription cofactor activity with the additional specification that the effect is positive, so this term could be called something like protein-binding transcription cofactor activity involved in activation of transcription and have a relationship with the Process term positive regulation of transcription.
  3. changes to transcription corepressor activity - GO:0003714
    • all the same issues as for transcription coactivator activity - GO:0003713 but with a negative instead of positive effect.
  4. Eventually, I think we should make subtypes of cofactor activities representing specific functions. However, the literature is so confusing, I don't have suggestions yet. I do have a review (Thomas & Chiang, 2006) and a contact from the meeting who volunteered to provide further input.

NOTE: I've said "changes" in the proposal for these terms. However, whether we can just change them, or whether we need to obsolete them and create new terms with suggested replacements to consider may depend on the existing annotations and whether they would still be true if the term changes.

Note for future:

Based on feedback from the transcription meeting, I think we will want to create subtypes of cofactors depending on exactly how they act. However, the literature here is confusing in that the word "cofactor" is used in many different ways, not always with a clear definition of what is meant. Thus, I think that some but not all things described as a "cofactor" can be represented in the function ontology, with the corresponding implication that some things described in the literature as "cofactors" but without any clear indication of how they act will receive Process annotations under regulation of transcription, but not Function annotations.

I am not proposing any more specific terms at this time though because further research into this area looks to be time consuming and beyond the time constraint for this project.

Actions taken:

  1. changes to transcription cofactor activity - GO:0003712
    • name change to protein-binding transcription cofactor activity and additional parentage under protein binding
      - done: new parentage under transcription factor binding transcription factor activity (GO:0000989)
      - question term still has original name, should we add words "protein binding" or similar as proposed?
    • definition change
      - done: broadened definition to be general as implied by name and presence in prokaryotic subset
    • make Function-Process link to regulation of transcription since it's parent term transcription activator activity should be merged into a Process term
    • new terms for specific RNAPs?
      - done: have made RNAP II specific term
      - question should we make bacterial-type specific term?
  2. changes to transcription coactivator activity - GO:0003713
    • definition issue - The current definition is specific to RNAP II, but should be inclusive of prok's
      - done: broadened definitions to be general as implied by name and presence in prokaryotic subset
      - question term still has original name, should we add words "protein binding" or similar as proposed?
    • new terms for specific RNAPs?
      - done: have made RNAP II specific term
      - question should we make bacterial-type specific term?
  3. changes to transcription corepressor activity - GO:0003714
    • definition issue - The current definition is specific to RNAP II, but should be inclusive of prok's
      - done: broadened definitions to be general as implied by name and presence in prokaryotic subset
      - question term still has original name, should we add words "protein binding" or similar as proposed?
    • new terms for specific RNAPs?
      - done: have made RNAP II specific term
      - question should we make bacterial-type specific term?
  4. make subtypes of cofactor activities representing specific functions
    - no action to be taken now; as people reannotate, they can make suggestions

RNA polymerase I, II, or III transcription factor activity, and child terms

Proposals:

  1. Merge these three terms into the corresponding process terms
    - These three terms merely indicate involvement in transcription by a specific RNA polymerase and are equivalent in meaning to process terms like transcription from RNA polymerase II promoter.
    - They are also grouping multiple kinds of function based purely on involvement in the same process.
    • RNA polymerase I transcription factor activity - GO:0003701
      def: "Functions to initiate or regulate RNA polymerase I transcription."
    • RNA polymerase II transcription factor activity - GO:0003702
      def: "Functions to initiate or regulate RNA polymerase II transcription."
    • RNA polymerase III transcription factor activity - GO:0003709
      def: "Functions to initiate or regulate RNA polymerase III transcription."
  2. Merge these three terms into the corresponding process terms
    - These terms are equivalent to existing process terms and are grouping on the basis of process, not on function.
    - Some specific functions can be represented when they can be described in terms of how they function, see section on transcription factor activity.
    - Note that general and nonspecific are two words for the same thing, and basal is a third which seems to be the preferred word now. Thus, there will be some consolidation of the process terms so that this is not represented twice.
    - The big problem in terms of function though is that basal transcription factors include things which bind DNA and also things which do not, but instead bind other transcription factors, so these terms process based groupings that include multiple different kinds of functions.
    • general RNA polymerase II transcription factor activity - GO:0016251
      def: "Any function that supports basal (unregulated) transcription of genes by core RNA polymerase II. Five general transcription factors are necessary and sufficient for such basal transcription in yeast: TFIIB, TFIID, TFIIE, TFIIF, TFIIH and TATA-binding protein (TBF)."
    • nonspecific RNA polymerase II transcription factor activity - GO:0016252
      def: "Any function that supports transcription of genes by RNA polymerase II, and is not specific to a particular gene or gene set."
    • specific RNA polymerase II transcription factor activity - GO:0003704
      def: "Functions to enable the transcription of specific, or specific sets, of genes by RNA polymerase II."
      - alternate for this term: Depending on annotations, an alternate proposal for this specific term would be to convert it to the term sequence-specific RNA polymerase II promotor regulatory transcription factor site binding proposed in the section on "transcription factor activity & promoter binding" in order to convert this into a term that specifically represents regulatory transcription factor binding to specific promoter sequences. However, if checking annotations reveals that the term has been used for basal transcription factors, e.g. TFIID, etc, as well, then I am not sure that it would be safe to convert the term as existing annotations might no longer be true. If that is the case, then perhaps people would want to obsolete this term instead of merging so that multiple alternates can be suggested.
  3. Merge this term with the corresponding complex term or obsolete so that multiple terms can be suggested.
    - This term seems to be specific for Mediator, so I think it is equivalent to the complex term for Mediator.
    - This term does not seem to be describing how the function is done, and with the name mediator in the name seems likely to be used specifically for Mediator.
    - I think that some combinations of binding activities can be represented as specific terms under the protein binding term.
    • RNA polymerase II transcription mediator activity - GO:0016455
      def: "Functions to mediate the interaction of transcriptional activators with the RNA polymerase II-general RNA polymerase II transcription factor complex."
  4. Rename and redefine terms already including binding in their defs
    - These two terms seem to be indicating specific binding as components of their function, so I think these can be retained, in the binding tree, with some possible modifications to the name and definition of the terms.
    • RNA polymerase II transcription factor activity, enhancer binding - GO:0003705
      def: "Functions to initiate or regulate RNA polymerase II transcription by binding an enhancer region of DNA."
    • ligand-regulated transcription factor activity - GO:0003706
      def: "Combining with a steroid hormone to initiate a change in cell activity."

Actions taken

  1. terms for RNA polymerase I, II, or III transcription factor activity (GO:0003701, GO:0003702, GO:0003709)
  2. terms for general, nonspecific, or specific RNA polymerase II transcription factor activity (GO:0016251, GO:0016252, GO:0003704)
  3. term for RNA polymerase II transcription mediator activity - GO:0016455
  4. rename and redefine RNA polymerase II transcription factor activity, enhancer binding (GO:0003705) & ligand-regulated transcription factor activity (GO:0003706)
    - GO:0003705 renamed to sequence-specific enhancer binding RNA polymerase II transcription factor activity
    - GO:0003706 is now in the purvue of the signalling group and see the [SourceForge item] asking about the difference between GO:0003706 & ligand dependent nuclear receptor activity (GO:0004879)

transcription activator activity - GO:0016563, transcription repressor activity - GO:0016564, & child terms

  • transcription activator activity - GO:0016563
    def: "Any transcription regulator activity required for initiation or upregulation of transcription."
  • transcription repressor activity - GO:0016564
    def: "Any transcription regulator activity that prevents or downregulates transcription."
  • specific transcriptional repressor activity - GO:0016566
    def: "Any activity that stops or downregulates transcription of specific genes or sets of genes."
  • basal transcription repressor activity - GO:0017163
    def: "Any transcription regulator activity that prevents or downregulates basal transcription. Basal transcription results from transcription that is controlled by the minimal complement of proteins necessary to reconstitute transcription from a minimal promoter."
  • general transcriptional repressor activity - GO:0016565
    def: "Any activity that stops or downregulates transcription of genes globally, and is not specific to a particular gene or gene set."

Proposal (changes to and obsoletions of existing terms):

  1. obsolete transcription activator activity - GO:0016563, or merge into process term positive regulation of transcription (GO:0045941)
    • As currently defined this is equivalent to the process term positive regulation of transcription (GO:0045941).
    • There may be specific types of activators that can be described in terms of molecular activities, perhaps sequence-specific RNA polymerase II promotor transcription factor site binding involved in activation of transcription (see section on transcription factor activity. However, this term is defined largely on the basis of the effect on the process of transcription and not on the basis of how that effect is accomplished. In addition, since there are numerous ways that activators can function, this term is grouping different functions on the basis of process, so I don't think it is appropriate at a function. Any appropriate individual function terms can be created as types of binding, and then they can have relationships to positive regulation of transcription.
    • We may want to obsolete this term so that we can suggest multiple consider terms, the process term positive regulation of transcription (GO:0045941) as well as any function terms that can be created of the type suggested above.
  2. transcription repressor activity - GO:0016564 and children
    1. obsolete transcription repressor activity - GO:0016564, or merge into process term negative regulation of transcription (GO:0016481)
      • As currently defined this is equivalent to the process term negative regulation of transcription (GO:0016481)
      • I think this term can be dealt with similarly to transcription activator activity above
    2. obsolete specific transcriptional repressor activity - GO:0016566
      • As defined, this term is related to the Process term negative regulation of gene-specific transcription from RNA polymerase II promoter (GO:0010553) and is defined in terms of effect on the process of transcription, rather than how it is accomplished.
      • Similarly to transcription activator activity, this term should be obsoleted and any appropriate individual function terms can be created as types of binding, and then they can have relationships to negative regulation of transcription.
      • Merging might be difficult for this term since there no process term for specific transcription at a general level, only one for gene-specific transcription from RNA polymerase II promoter (GO:0032569)
    3. obsolete basal transcription repressor activity - GO:0017163 & general transcriptional repressor activity - GO:0016565
      • I group these two together because these two terms are identical to each other, as basal and general are two words for the same thing, at least for RNAP II. I'm aware of basal being used for E. coli RNAP where it means the same thing as for RNAP II, but have not encountered general being used in the prokaryotic context. Basal seems to be the preferred word now, over general, in the RNAP II literature, as it has become clear that the "general" factors aren't as general as once thought.
      • Similarly to other terms discussed in this section, e.g. transcription activator activity and specific transcriptional repressor activity, I think this term may represent multiple different types of binding functions, but that it is grouping based on the process, so I recommend obsoleting this term. As with others above, any appropriate individual function terms can be created as types of binding, and then they can have relationships to negative regulation of transcription.
      • Merging might also be difficult for these terms since there no process term for basal or general at a general level, only terms at the level of RNAP II transcription.

elongation regulator, termination factor, and antiterminator activity terms

Proposal

  1. Elongation regulator terms
    • Terms
      • transcription elongation regulator activity - GO:0003711
        def: "Any activity that modulates the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule following transcription initiation."
      • negative transcription elongation factor activity - GO:0008148
        def: "Any activity that decreases the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule following transcription initiation."
      • positive transcription elongation factor activity - GO:0008159
        def: "Any activity that increases the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule following transcription initiation."
      • RNA polymerase I transcription elongation factor activity - GO:0016943
        def: "Any activity that modulates the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule catalyzed by RNA polymerase I following transcription initiation."
      • RNA polymerase II transcription elongation factor activity - GO:0016944
        def: "Any activity that modulates the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule catalyzed by RNA polymerase II following transcription initiation."
      • RNA polymerase III transcription elongation factor activity - GO:0016945
        def: "Any activity that modulates the rate of transcription elongation, the addition of ribonucleotides to an RNA molecule catalyzed by RNA polymerase III following transcription initiation."
    • Proposal: Merge these terms with corresponding process terms
      - All of these terms are defined purely in terms of their effect on elongation and thus are completely equivalent to existing process terms.
      - These terms are likely to be grouping a number of different functions based on process rather than on functional similarity.
      - Where specific elongation factor activities can be described in terms of how they act, appropriate terms can be created. None are proposed as yet.
  2. Termination factor activity terms
    • Terms
      • transcription termination factor activity - GO:0003715
        def: "Any activity that brings about termination of transcription."
      • RNA polymerase I transcription termination factor activity - GO:0003716
        def: "Any activity that brings about termination of transcription by RNA polymerase I."
      • RNA polymerase II transcription termination factor activity - GO:0003717
        def: "Any activity that brings about termination of transcription by RNA polymerase II."
      • RNA polymerase III transcription termination factor activity - GO:0003718
        def: "Any activity that brings about termination of transcription by RNA polymerase III."
    • Proposal: Merge these terms with corresponding process terms
      - All of these terms are defined purely in terms of their effect on termination and thus are completely equivalent to existing process terms.
      - These terms are likely to be grouping a number of different functions based on process rather than on functional similarity.
      - Where specific termination factor activities can be described in terms of how they act, appropriate terms can be created. None are proposed as yet.
  3. Antiterminator activity term
    • Term
      • transcription antiterminator activity - GO:0030401
        def: "Functions to prevent the termination of RNA synthesis. Acts as a regulatory device, e.g. in phage lambda, enabling a terminator to be masked from RNA polymerase so that distal genes can be expressed."
    • Proposal: Rename and redefine this term & add some new subtypes
      - This term is named very broadly, and the first sentence of the definition is also very broad. However, the example in the second sentence is very specific. On the thought that this term has likely only been used for prokaryotic annotations (quick glimpse at AmiGO supports this idea), I propose to keep this term but adjust the name and definition to talk about how it acts, e.g. nucleic acid binding
      - There would be some additional child terms, depending on whether it binds a sequence in the nascent RNA or a sequence in the DNA, see Greenblatt J et al. Transcriptional antitermination. Nature. 1993 Jul 29;364(6436):401-6.

Actions taken

  1. Elongation regulator terms
  2. Termination factor activity terms
  3. Antiterminator activity term
    - new terms created for DNA binding transcription antitermination factor activity (GO:0001073) & RNA binding transcription antitermination factor activity (GO:0001072)
    - question should we make terms specific for types of RNAPs?

sigma factor related terms

Proposal

  1. sigma factors => promotor specificity factors defined in terms of what they bind
    Terms
    1. transcription initiation factor activity - GO:0016986
      def:
    2. sigma factor activity - GO:0016987
      def: "A sigma factor is the promoter specificity subunit of eubacterial-type multisubunit RNA polymerases, those whose core subunit composition is often described as alpha(2)-beta-beta-prime. (This type of multisubunit RNA polymerase complex is known to be found in eubacteria and plant plastids). Although sigma does not bind DNA on its own, when combined with the core to form the holoenzyme, this binds specifically to promoter sequences, with the sigma factor making sequence specific contacts with the promoter elements. The sigma subunit is released from the elongating form of the polymerase and is thus free to act catalytically for multiple RNA polymerase core enzymes."
    3. mitochondrial transcription initiation factor activity - GO:0034246
      def: "A transcription factor activity that confers promoter specificity upon mitochondrial RNA polymerase, in a manner analogous to eubacterial sigma factors."#* The first transcription initiation factor activity is undefined but is the parent term of the other two terms (with no other child terms), so I want to consider these three terms together.
    • Proposal: obsolete transcription initiation factor activity - GO:0016986
      The first term transcription initiation factor activity should probably be obsoleted because the current name is much too broad and it seems that it might have been broadly used based on the name, which doesn't represent a specific function.
    • Proposal: rename and redefine sigma factor activity - GO:0016987, & merge or rename and redefine? mitochondrial transcription initiation factor activity - GO:0034246
      • The two child terms both represent essentially the same thing, a specificity factor that binds to core RNA polymerase (which is unable to recognize DNA on its own) and while bound to the RNAP confers sequence specific DNA binding. We could represent this as something like core RNA polymerase binding promoter specificity factor activity and give it parentage under both core RNA polymerase binding and under sequence-specific DNA binding. Possibly the existing sigma factor activity term could be redefined to be general, i.e. not specific to sigma factors. [This would be similar to the fact that we currently don't have a specific term for bacterial RNAP; annotation to the general term RNAP is sufficient since bacteria have only one RNAP complex.] While E. coli, and other bacteria, have many different sigma factors, they have only one RNAP, so the primary criterion distinguishing different sigma factors is which promoter sequence they recognize. We feel that GO should not get into the level of detail of describing specific binding sites, so 1 term for this type of activity is sufficient.
      • I am not sure if we need more than just one term for this. This kind of specificity factor exists for both prokaryotes, e.g. E. coli, and also for mitochondrial RNAP. However, I can't think of any examples of cells which have more than one kind of polymerase with this kind of specificity factor, so I think we may only need one term for this type of activity. I guess the argument in favor of keeping this term would be in order to make a link between it and existing process terms for mitochondrial transcription, though at the moment I think it would be an only child as I only know of this type of activity for prokaryotic and for mitochondrial transcription. If we do keep this individual term, then we'll probably want to change the name and def a bit to be consistent with the general term and the relationship with the mitochondrial transcription process term.
  2. sigma factor antagonists => redefine in terms of binding activities
    • Terms
      1. transcription initiation factor antagonist activity - GO:0016988
        def: "The function of binding to a transcription factor and stopping, preventing or reducing the rate of its transcriptional activity."
      2. sigma factor antagonist activity - GO:0016989
        def: "The function of binding to a sigma factor and stopping, preventing or reducing the rate of its transcriptional activity."
    • Proposal: obsolete transcription initiation factor antagonist activity - GO:0016988 & rename and redefine sigma factor antagonist activity - GO:0016989
      • These two terms are tied to the item above about sigma factor type specificity factors since these basically represent things that bind to sigma, or comparable, factors to prevent them from binding to core RNAP
      • transcription initiation factor antagonist activity isn't even defined correctly based on the name of the term in that it does not specify binding to an initiation factor. Since this term is badly named and defined, I recommend obsoleting it.
      • The decision of whether we need only a single general term or specific terms that affect regulation of specific types of polymerases should follow the decision made for sigma type specificity factors.
        1. The term sigma factor antagonist activity should probably be renamed and defined to become a general term for binding to a promoter specificity factor of the sigma type so that it will be broad enough to represent this activity generally and still represent what is known for prokaryotic transcription.
        2. I do not know if such an activity is characterized in mitochondrial transcription, so even if we create a specific term for the mitochondrial promoter specificity factor, I don't yet see a need to create a mitochondrial specific term here.
  3. proteins that bind anti-sigma factors and similar
    • Term(s)
      1. anti-sigma factor antagonist activity - GO:0043856
        def: "The function of binding to an anti-sigma factor and stopping, preventing or reducing the rate of its activity."
    • Proposal: rename and redefine anti-sigma factor antagonist activity - GO:0043856
      This can be renamed and redefined so that it can be general for anything that binds to a protein that binds to a protein which binds to a specificity factor to prevent it from binding to RNAP core.

Actions taken

  1. sigma factors => promotor specificity factors defined in terms of what they bind
    • Proposal: obsolete (we should merge instead) transcription initiation factor activity - GO:0016986
    • Proposal: rename and redefine sigma factor activity - GO:0016987, & merge or rename and redefine? mitochondrial transcription initiation factor activity - GO:0034246
      - done: new term for core RNA polymerase binding promoter specificity activity (GO:0000996) created
      - done: sigma factor activity (GO:0016987) name and def kept as is, but moved to new place in graph and has_part relationships added
      - done: GO:0034246 renamed as mitochondrial RNA polymerase binding promoter specificity activity with new def and moved under GO:0000996
      - check for synonym using descriptive name:
  2. sigma factor antagonists => redefine in terms of binding activities
    - done: moved under transcription factor binding transcription factor activity (GO:0000989)
    - question revisit def?
  3. proteins that bind anti-sigma factors and similar
    - done: moved under transcription factor binding transcription factor activity (GO:0000989)
    - question revisit def?

transcription factor binding terms

Proposal

Currently, most of child terms of "transcription factor binding" (except "transcription cofactor activity") and the children of "transcription coactivator activity" represent binding to specific proteins. We feel that this is too specific to represent with individual GO terms because there are thousands and thousands of specific transcription factors. Groups who wish to curate this level of detail should utilize column 16 to indicate what is being bound.

We would like to represent binding just to classes of transcription factors, such as basal or regulatory transcription factors.

Proposal: new terms

I am showing only a sample set of new terms, some general terms relevant to both prokaryotes and eukaryotes and some terms specific for RNA polymerase II. Where appropriate there will be links to Process terms.

- transcription factor binding - GO:0008134
-- basal transcription factor binding
--- basal RNA polymerase II transcription factor binding
-- regulatory transcription factor binding
--- regulatory RNA polymerase II transcription factor binding
---- regulatory RNA polymerase II transcription factor binding involved in activation of transcription (link to Process)
---- regulatory RNA polymerase II transcription factor binding involved in repression of transcription (link to Process)

Also see Function (red) terms under transcription factor binding in this diagram, which also shows links to Process (blue) terms.

Proposal: obsoletions

Specific terms proposed to obsolete:

terms under transcription factor binding

  • aryl hydrocarbon receptor binding - GO:0017162
  • bHLH transcription factor binding - GO:0043425
  • MRF binding - GO:0043426
  • Mrf4 binding - GO:0051578
  • Myf5 binding - GO:0051576
  • MyoD binding - GO:0051577
  • myogenin binding - GO:0051579
  • NF-kappaB binding - GO:0051059
  • NFAT protein binding - GO:0051525
  • NFAT1 protein binding - GO:0051526
  • NFAT2 protein binding - GO:0051527
  • NFAT3 protein binding - GO:0051528
  • NFAT4 protein binding - GO:0051529
  • NFAT5 protein binding - GO:0051530
  • retinoic acid receptor binding - GO:0042974
  • retinoid X receptor binding - GO:0046965
  • Tat protein binding - GO:0030957
  • thyroid hormone receptor binding - GO:0046966

terms under transcription cofactor activity

  • cAMP response element binding protein binding - GO:0008140
  • ligand-dependent nuclear receptor transcription coactivator activity - GO:0030374
  • thyroid hormone receptor coactivator activity - GO:0030375

Actions taken

  1. New terms
    - done: have made new terms for binding to transcription factors for RNAP III and III
    - done: have made new terms for binding to RNAP II basal transcription factors, e.g. TFIIA-class transcription factor binding
    - done: done: have made new terms for binding to main RNAP III transcription factors, e.g. TFIIIA-class transcription factor binding
    - question Should we make some similar terms for RNAP I and bacterial type RNAP?
  2. Obsoletion of terms representing binding to a specific transcription factor

RNA polymerase binding terms

Proposal

We currently have these two terms:

  • RNA polymerase binding - GO:0070063
    Def: Interacting selectively and non-covalently with an RNA polymerase.
  • RNA polymerase core enzyme binding - GO:0043175
    Def: Interacting selectively and non-covalently with the prokaryotic RNA polymerase core enzyme, the part of the RNA polymerase consisting of two alpha, one beta and one beta prime subunits.

The RNA polymerase core enzyme binding term has existed for quite some time. Although it has a very general name, the definition is quite specific for the prokaryotic type of RNA polymerase core enzyme, though the term core enzyme is also used for the multisubunit eukaryotic nuclear RNAP's I, II, and III also.

The general term RNA polymerase binding was added in response to a SF item about RNA polymerase binding. There has also been a request to add new MF terms for eukaryotic RNA polymerase binding.

I think there is good reason to add more terms for various types of core RNA polymerase binding terms. However, I am very hesitant about adding holo enzyme terms for the eukaryotic enzymes because:

  1. The compositions are not well characterized. For the case of RNAP II, multiple different "holoenzymes" have been described in the literature with different compositions. They do not each have individual names, but are all just called the RNAP II holoenzyme. The single existing component term for RNAP II holoenzyme does not reflect the complexity of the situation, see issues with: DNA-directed RNA polymerase II, holoenzyme
  2. While it is true that "holoenzymes" can be purified for eukaryotic nuclear RNAPs, it is not completely clear that they are meaningful biologically.

Proposal

  1. Redefine: RNA polymerase core enzyme binding - GO:0043175
    - While the definition is clearly spelled out in terms of the composition of the prokaryotic enzyme, the existing annotations in AmiGO shows that annotators have clearly used the term name rather than the definition:
    • ercc8 - WD40 repeat-containing protein, DNA excision repair protein 8 (gene from Dictyostelium discoideum)
    • PSPPH_0861- RNA polymerase-binding protein DksA (protein from Pseudomonas syringae pv. phaseolicola 1448A)
    • RBM16 - RNA-binding protein 16 (protein from Homo sapiens)
    • SPAC23A1.16c - RNA polymerase II-associated protein Rtr1 (gene from Schizosaccharomyces pombe)
    - Since the term has clearly been annotated based on the name, we propose to broaden the definition so that this term will be come a general term for binding to a core RNA polymerase enzyme.
  2. New terms: for binding to types of core RNAP enzymes
    -- We are thinking that there is good reason to create types for core binding for each type of RNA polymerase that can exist in a single cell because we will be making function terms that incorporate binding to a specific type of core RNAP. Plants have the largest number of different types, with additional nuclear RNAPs and chloroplast RNAPs in addition to the standard three nuclear RNAPs and mitochondrial RNAP found in all eukaryotes.
    - I am not sure that we need to generate a new term that is specific for prokaryotic RNAP. We don't have a specific component term for prok RNAP either because the general term is sufficient since there is only one RNAP in prokaryotes. Thus based on the existing SF item, we are thinking about these new terms:
-enzyme binding
--RNA polymerase binding (GO:0070063)
---RNA polymerase core enzyme binding (GO:0043175)
----RNA polymerase I core enzyme binding (GO:new)
----RNA polymerase II core enzyme binding (GO:new)
----RNA polymerase III core enzyme binding (GO:new)
----RNA polymerase IV core enzyme binding (GO:new)
-----RNA polymerase IVa core enzyme binding (GO:new)
-----RNA polymerase IVb core enzyme binding (GO:new)
----plastid-encoded plastid RNA polymerase complex core enzyme binding (GO:new)
-----plastid-encoded plastid RNA polymerase complex A core enzyme binding (GO:new)
-----plastid-encoded plastid RNA polymerase complex B core enzyme binding (GO:new)


Modifications to the Proposal

Due to the inconsistent use of the word "promoter", we will also create a type for "bacterial-type RNA polymerase activity". It will be the parent term for the plant chloroplast RNAP described as being "bacterial-type". I have also learned of more types of plant organellar RNAPs since the original proposal, for example a chloroplast enzyme of the single-subunit type.

Action taken

  1. Redefine: RNA polymerase core enzyme binding - GO:0043175
    - done: This term has been redefined to be general for any core RNAP
  2. New terms: for binding to types of core RNAP enzymes
    - done: A number of new RNAP core binding terms, including one for the bacterial type enzyme as specified in the modification
    - done: created binding terms for single-subunit type RNAPs

RNA polymerase activity terms

Proposal

We used to have terms for RNA polymerase I activity, RNA polymerase II activity, and RNA polymerase III activity. They were obsoleted, in part, because of the fact that they were defined in terms of what type of RNA is produced, and this turns out not to be consistent enough across species to be a defining characteristic.

However, with the new philosophy of making Function-Process links, we are proposing to create terms representing specific RNA polymerase activities involved in transcription from specific types of promoters, e.g. DNA-dependent RNA polymerase activity involved in transcription from RNA polymerase II promoter, where the definition will make the distinction based only on the type of promoter, and not on the class of RNA produced. Similarly to the proposal for more types of RNA polymerase binding terms, the basic idea would be to create a specific term for each type of RNA polymerase that exists in a single cell, with plant cells being the model to draw from since they have the largest number of different types and include all the types found in other eukaryotic cells. We currently do not feel that we need to create specific terms to represent RNAPs for types of cells that have only a single RNAP, e.g. prokaryotes or archaea, since there is no need to distinguish multiple RNAP types in the same genome.

Proposal

  1. Create new terms for specific types of RNAPs that need to be distinguished within a single cell.
  2. Options for names of terms
    There are two options for names of terms. Many of the Function-Process link terms use the phrase involved in, as shown in the first set of terms below. However, these names are very long, so perhaps the shorter set based on the enzyme names would be preferred.
  3. Definitions
    Regardless of the term name, the definitions will specify the distinction between these terms only in terms of what type of promoter is utilized.
- DNA-directed RNA polymerase activity - GO:0003899
-- DNA-directed RNA polymerase activity involved in transcription from RNA polymerase I promoter
-- DNA-directed RNA polymerase activity involved in transcription from RNA polymerase II promoter
-- DNA-directed RNA polymerase activity involved in transcription from RNA polymerase III promoter
-- DNA-directed RNA polymerase activity involved in transcription from RNA polymerase IVa promoter
-- DNA-directed RNA polymerase activity involved in transcription from RNA polymerase IVb promoter
-- DNA-directed RNA polymerase activity involved in transcription from mitochondrial RNA polymerase promoter
-- DNA-directed RNA polymerase activity involved in transcription from plastid RNA polymerase type A promoter
-- DNA-directed RNA polymerase activity involved in transcription from plastid RNA polymerase type B promoter

- DNA-directed RNA polymerase activity - GO:0003899
-- RNA polymerase I activity
-- RNA polymerase II activity
-- RNA polymerase III activity
-- RNA polymerase IVa activity
-- RNA polymerase IVb activity
-- mitochondrial RNA polymerase activity
-- plastid RNA polymerase A activity
-- plastid RNA polymerase B activity

Modification to Proposal

Due to the inconsistent use of the word "promoter", we will also create a type for "bacterial-type RNA polymerase activity". It will be the parent term for the plant chloroplast RNAP described as being "bacterial-type". I have also learned of more types of plant organellar RNAPs since the original proposal, for example a chloroplast enzyme of the single-subunit type.


Actions taken

  1. Create new terms for specific types of RNAPs that need to be distinguished within a single cell.
    - done: created new terms for each type of RNAP present in a plant cell
    - done: created a new term for bacterial-type RNAP as indicated in modification to proposal
  2. Options for names of terms
    There are two options for names of terms. Many of the Function-Process link terms use the phrase involved in, as shown in the first set of terms below. However, these names are very long, so perhaps the shorter set based on the enzyme names would be preferred.
    - done: we selected the shorter use names
    - check that longer names are present as synonyms
  3. Definitions
    Regardless of the term name, the definitions will specify the distinction between these terms only in terms of what type of promoter is utilized.
    - done

Biological Process

Clarification of what is part of the transcription cycle (initiation, elongation, termination)

There have been SF items asking for clarification of what is, and is not, part of transcription, and especially transcription initiation specifically. There have also been items asking for representation of additional parts of the transcription cycle. These two issues are connected and by representing the various steps and defining them precisely, I think it will be clearer which term is appropriate and what is actually included in each step.

These are relevant SF items:

Proposal

  1. new Process terms
    - additional terms for the transcription cycle with more specific definitions to indicate exactly what is included
    - transcription from RNA polymerase II promoter will get an additional child term in Process to represent the promoter clearance transition from initiation to elongation (see diagram)
    - transcriptional initiation from RNA polymerase II promoter will get additional child terms in Process to represent defined steps within initiation (see diagram)
    - These same steps in the transcription cycle are also observed in E. coli RNAP, where much of the basic work defining the transcription cycle was done. There will be additional higher level terms, e.g. just promoter clearance or just transcriptional open complex formation that will be appropriate for prokaryotic annotation (not shown in diagram).
    - For other specific RNA polymerases, new parallel terms will be generated as appropriate (not shown in diagram).
  2. new Function terms
    - new terms representing binding to RNA polymerase or to transcription factors as part of preinitiation complex (PIC) assembly (see diagram)
    - There are other function-process links that can be made. For example, open complex formation is an ATP-dependent step for some preinitiation complexes (depending on which polymerase, sigma factor, ...) so we probably can make a function term for ATPase activity involved in transcription open complex formation
  3. making links between Function and Process
    - With our new philosophy that Function terms should indicate how while Process terms can indicate what, we will no longer be making Function terms that group by Process. This type of grouping will now be done by making links from Function terms to Process terms. (see diagram)
    - Function-Process links are generally part_of or has_part

Diagram
A picture is worth a thousand words, but PLEASE NOTE that this diagram should be considered conceptual rather than the exact details. This is from my test version of the ontology that is not complete and I have left out some of the specifics to get clarity in the diagram. For example, in the Function section, I discussed representing core promoter elements versus binding sites for regulatory transcription factors; this is not represented here.

  • shows sample new terms in both Function and Process
  • shows sample Function-Process links

Clarification of "general/non-specific/basal" vs "specific" transcription

There have been SF items requesting terms for general/non-specific/basal" vs "specific" transcription and clarification of exactly what is meant, like this one: general and specific transcription - ID: 1590000. It has also been suggested to broaden the general vs specific distinction so that it applies to all transcription in order to be able to use these terms to be able to distinguish the genes that are involved in specific vs general transcription.

Currently there are terms for general and gene-specific transcription from RNA polymerase II promoter, but not for any other types of RNA polymerase promoters. These are the existing terms:

  • gene-specific transcription from RNA polymerase II promoter - GO:0032569
    Def: The specifically regulated synthesis of RNA from DNA encoding a specific gene or set of genes by RNA polymerase II (Pol II), originating at a Pol II-specific promoter. In addition to RNA polymerase II and the general transcription factors, specific transcription requires one or more specific factors that bind to specific DNA sequences or interact with the general transcription machinery.
  • general transcription from RNA polymerase II promoter - GO:0032568
    Def: The basal, non-specifically regulated synthesis of RNA from a DNA template by RNA polymerase II (Pol II), originating at a Pol II-specific promoter. Mediated by core RNA polymerase II and a set of general transcription factors; in Saccharomyces five transcription factors are necessary and sufficient for such basal transcription.

Issues

  1. The current definitions reflect outdating thinking
    - Early on it was thought that the RNAP II "general transcription factors", or GTFs were involved in preinitation complex (PIC) formation at ALL RNAP II promoters. However, it is now clear that there is no single type of RNAP II promoter. Thus there are probably many types of PICs and the "general" factors may not be general. Thus Sikorsky & Buratowski (2009) use the term basal rather than general.
    - The current definition for general says "non-specifically regulated". This is probably not really the case. Transcription of the set of genes that requires only basal transcription factors is regulated, they just don't require additional non-basal factors to recognize the promoter. It is also a quite large set of genes, larger than the typical number of genes that is responsive to a particular activator, and they are active in "standard" growth conditions.
  2. term names also reflect outdating thinking
    - As mentioned above, Sikorsky & Buratowski (2009) use the term basal rather than general because it appears that the "general" factors aren't as general as once thought.
    - There may also be a better name for gene-specific. In a comprehensive review of eukaryotic general transcription machinery, Thomas and Chiang (2006) use the phrase activator-dependent. I think this is a more accurate phrase, and also consistent with the phrase that is used in E. coli for the same situation.
  3. position of the terms
    - The current terms general and specific RNAP II transcription are siblings of the terms for initiation, elongation, and termination of RNAP II transcription (see diagram). Thus if a gene is annotated to the term for gene-specific transcription from RNA polymerase II promoter, you do not get any connection to what part of the transcription cycle (initiation, elongation, or termination) it is involved in.
    - As far as I can tell, basal vs gene-specific indicate whether the PIC can form with only basal factors or require an "activator", a regulatory transcription factor that binds a specific promoter sequence, to recognize the promoter and recruit basal factors to the promoter.
  4. gene sets involved in basal vs specific transcription
    - It is not the case that there is one set of genes involved in basal transcription and another set involved in specific transcription.
    - Basal refers to transcription that can occur only with basal transcription (initiation) factors.
    - "Gene-specific" means that the promoter sequence is not recognized directly by the basal factors, but instead by a "gene-specific" regulatory transcription factor, which then recruits a basal transcription factor.
    - Thus, as far as I know, everything involved in basal transcription is also involved in gene-specific transcription.
  5. generality of general vs gene-specific (or basal vs activator-dependent)
    - Eukaryotic transcription by RNA polymerase II and prokaryotic transcription in E. coli share some similarities in this area, and the basal vs activator-dependent terminology has been used in E. coli for a long time, where it means the same thing as I have outlined above for RNAP II. Thus there is clearly some need to represent basal vs activator-dependent transcription at a general level so that prokaryotic annotators can make this distinction also.
    - It is not clear that this is a meaningful distinction for the eukaryotic RNA polymerases I or III. Particularly for RNAP I, which (for most organisms) only makes a single type of transcript (the large rRNA) from a single type of promoter, there isn't really any gene-specific regulation going on.

Useful reviews

Proposal

  1. Rename the two existing terms and move them to be subtypes of initiation (see diagram showing proposed term changes; existing terms are circled with arrows pointing to proposed new locations of renamed terms)
  2. Redefine existing terms to reflect current understanding
  3. Create additional higher level terms appropriate for use for prokaryotic annotation (not shown on diagram)
  4. At this time, I see no need for basal and activator-dependent terms for RNAP I or III, though the structure is such that it will be possible to add them for any polymerase/promoter type for which it is appropriate.

"Txn from promoter types" vs "transcription of classes of RNAs"

We were considering whether or not it made sense to have terms based on the type of RNA produced, e.g. rRNA transcription, since the process of transcription is really more a factor of what type of promoter, i.e. which RNA polymerase does the transcription, than which class of RNA. We were wondering if it made more sense to only have terms representing the promoter class from which the transcription originated, e.g. transcription from RNA polymerase III type 1 promoter. While there was some agreement with this idea at the transcription meeting, I have seen some papers talking about regulation of transcription of certain classes of RNAs since coming back. Thus, currently, I think the conservative approach is to add some additional terms representing distinct, well-characterized promoter classes WITHOUT removing any of the RNA class based terms.

Top level RNA class (e.g. snoRNA) based terms that currently exist:

  • mRNA transcription
  • rRNA transcription
  • tRNA transcription
  • snRNA transcription
  • snoRNA transcription

Proposal

  1. leave existing RNA class (e.g. snoRNA) based terms as is
  2. changes to existing term 5S class rRNA transcription
    - rename to use standard nomenclature for RNA polymerase III promoter type: transcription from RNA polymerase III type 1 promoter
    - existing name would become synonym
    - update definition to provide additional information about the promoter type
  3. add new terms
    • definitely for RNA polymerase III promoter types not currently represented
      - type 1
      - type 3
    • for other promoter types as needed

RNA elongation - GO:0006354

The standard phrase in the literature is transcription elongation, not RNA elongation.

Proposal: Term name change

  1. We propose to swap the current name RNA elongation with one of the existing synonyms transcription elongation to go with prodominant usage in the literature and to be parallel with the names of the other stages of transcription transcription initiation and transcription termination for this term and all child terms that use this nomenclature.

Additional comments

merging vs obsoleting (with terms suggested for reannotation)

There are a number of places where we have suggested merging Function terms into Process terms that represent the same thing. In most cases, the terms represent generic functions that are really defined by the process. Another option for dealing with terms like this, at least for some of them, would be to obsolete the term, rather than merge it. The obsolete option would be attractive in cases where we know that several bona-fide functions exist and we could use them in 'consider' tags. For example, this might be appropriate in cases where the existing Function term is grouping multiple different kinds of functions based on involvement in a similar process. There may also be multiple new Function terms to be created that annotators might want to consider in reannotation. Whichever option is preferred by annotators is fine.

Key to ontology diagrams

  • Terms:
    • red box = Function term
    • blue box = Process term
  • Links:
    • blue line = is_a relationship
    • yellow line = part_of relationship
    • green line = has_part relationship