Proposals to overhaul transcription in GO - 2010

From GO Wiki
Jump to navigation Jump to search

Philosophy of Overhaul

Over the last few years, GO has changed how we talk about and create Function terms so that they now represent how something occurs, e.g. binding, catalytic activities, etc. In this process, we have begun to avoid, and sometimes eliminate, Function terms that duplicate process terms, that is functions that do not describe how the gene product acts but only specify that it is involved in a process. However, despite the fact that we have always said that Function and Process should represent non-overlapping aspects, we have many older terms in Function that essentially duplicate a Process term. Compare, for example, the Function term transcription regulator activity with the Process term regulation of transcription. Both terms essentially mean the same thing. In addition, the Function term transcription regulator activity is not grouping the terms below it on the basis of having similar functions, but rather on the basis of being involved in the same process. This lack of clarity in the distinction between Function and Process generates confusion, both for annotators and for users. One researcher at the meeting told me that she only uses GO occasionally and she can never remember whether the term she wants is in Function or Process.

One of the major goals of this overhaul is to generate clarity between the function terms and the process terms for transcription. We are proposing to eliminate some Function terms that are equivalent to Process terms and which cannot be converted into a description of the molecular activity, or activities, involved. In other cases, we are proposing changes to Function terms so that they actually describe molecular activities. With the recently developed method of creating links between Function and Process terms, the old motivations to have terms like transcription regulator activity in order to be able to group by process should be addressed anyway, since terms representing functions involved in regulation of transcription will have a relationship to that Process term, or to a more specific child term as appropriate.

With respect to annotation, these changes will mean that in cases where the experiments indicate that a gene product is involved in regulating transcription, but give no indication as to how it acts, it would be appropriate to annotate only with a Process term and NOT with a Function term.

Molecular Function

transcription factor activity - GO:0003700 & promoter binding - GO:0010843

The term "transcription factor activity" has parentage under "DNA binding", rather than under "sequence-specific DNA-binding", even though it is defined as binding to a specific DNA sequence. The majority of things in this class are "regulatory" transcription factors which bind a specific DNA sequence present in a relatively limited set of promoters. There are also basal transcription factors which bind to specific core promoter elements in a sequence specific way.

However, we also need to account for the fact that not all basal txn factors bind DNA. Currently, we also have terms like 'RNA polymerase II transcription factor activity" that do not have parentage under "DNA binding" and which really mean anything involved in regulating RNAP II transcription, i.e. basically a process definition. This problem has been reported in SourceForge multiple times because the current structure makes it impossible to indicate that a transcription factor is a sequence-specific DNA binding factor for a specific polymerase; you can either indicate that it binds DNA, or that it is a factor for RNAP I, II, or III, but not that it binds a specific sequence for a specific RNAP.

We would like to have function terms that indicate what type of DNA sequence element is being bound, e.g. a basal promoter element versus the binding site for a regulatory transcription factor, such as Gal4 in yeast. Additionally, we would like to be able to indicate binding to enhancer sites. So the distinction in function will be by the type of DNA site bound.

Proposal: With that in mind, here is a proposed structure (where no number is indicated, the term would be new) and some specific changes proposed for some of the existing terms.

Changes to existing terms:

  1. transcription factor activity - GO:0003700
    • Change name to "sequence specific DNA-binding transcription factor activity", as shown in structure below, or similar
    • Change position to reflect current definition that this indicates sequence specific binding to DNA. Currently, this term is directly under DNA-binding, but the definition specifies a specific sequence, so we propose to move it to be a direct child of sequence-specific DNA binding (GO:0043565).
  2. promoter binding - GO:0010843
    • Change either name or definition. Currently this term is defined too narrowly such that it only includes the core promoter elements, while the binding sites for regulatory transcription factors are also considered to be promoter elements. In addition, the child terms are not all core promoter elements. We recommend changing the definition to match the name. The other possibility is changing the name to core promoter binding so that it matches the current definition, but if people have annotated based on the broader term name, annotations will become incorrect with this option.
    • The def should also avoid specifying that the binding sites are for complexes; not all transcription factors are.


Structure showing new terms and relationship to existing terms:

- DNA binding - GO:0003677
-- (i) sequence-specific DNA-binding - GO:0043565
--- (i) sequence-specific DNA-binding transcription factor activity - GO:0003700
---- (i) sequence-specific promoter binding - GO:0010843 or GO:new
----- (i) sequence-specific core promoter binding  - GO:0010843 or GO:new
------ (i) sequence-specific RNA polymerase I core promoter binding
------ (i) sequence-specific RNA polymerase II core promoter binding
------ (i) sequence-specific RNA polymerase III core promoter binding
----- (i) sequence-specific regulatory transcription factor site* binding
------ (i) sequence-specific promotor transcription factor site binding
------- (i) sequence-specific RNA polymerase I promotor transcription factor site binding
------- (i) sequence-specific RNA polymerase II promotor transcription factor site binding
------- (i) sequence-specific RNA polymerase III promotor transcription factor site binding
---- (i) specific enhancer transcription factor site binding

Proposed obsoletions:

We propose that all children of promoter binding (GO:0010843) should be obsoleted. All of these represent binding to specific sequence motifs. It is beyond the scope of GO to capture the thousands of individual sequence motifs that exist. We are working with Karen Eilbeck of SO to make sure that GO and SO are in synch with respect to how promoter motifs are defined and even SO is planning only to represent general classes of motifs, not every specific one. Thus, we feel that the appropriate level of detail for GO to capture for promoters is whether it is a core promoter element, a binding site for a regulatory transcription factor, or an enhancer binding site. Thus we would like to indicate the types of binding to general types of motifs as indicated in the structure above. These existing terms that indicate very specific motifs, we feel should be obsoleted.

  • cAMP response element binding - GO:0035497
    def: "Interacting selectively and non-covalently with the cyclic AMP response element (CRE), a short palindrome-containing sequence found in the promoters of genes whose expression is regulated in response to cyclic AMP." [PMID:2875459, PMID:2900470]
  • carbohydrate response element binding - GO:0035538
    def: "Interacting selectively and non-covalently with the carbohydrate response element (ChoRE) found in the promoters of genes whose expression is regulated in response to carbohydrates, such as the triglyceride synthesis genes." [GOC:BHF, PMID:20001964]
  • E-box binding - GO:0070888
    def: "Interacting selectively and non-covalently with an E-box, a DNA motif with the consensus sequence CANNTG that is found in the promoters of a wide array of genes in neurons, muscle and other tissues." [GOC:BHF, GOC:vk, PMID:11812799]
  • estrogen response element binding - GO:0034056
    def: "Interacting selectively and non-covalently with the estrogen response element (ERE), a conserved sequence found in the promoters of genes whose expression is regulated in response to estrogen." [GOC:ecd, PMID:15036253, PMID:17975005]
  • juvenile hormone response element binding - GO:0070594
    def: "Interacting selectively and non-covalently with the juvenile hormone response element (JHRE), a conserved sequence found in the promoters of genes whose expression is regulated in response to juvenile hormone." [GOC:sart, PMID:17956872]
  • mitochondrial heavy strand promoter anti-sense binding - GO:0070362
    def: "Interacting selectively and non-covalently with the anti-sense strand of the heavy strand promoter, a promoter located on the heavy, or guanine-rich, strand of mitochondrial DNA." [GOC:mah, PMID:9485316]
  • mitochondrial heavy strand promoter sense binding - GO:0070364
    def: "Interacting selectively and non-covalently with the sense strand of the heavy strand promoter, a promoter located on the heavy, or guanine-rich, strand of mitochondrial DNA." [GOC:mah, PMID:9485316]
  • mitochondrial light strand promoter anti-sense binding - GO:0070361
    def: "Interacting selectively and non-covalently with the anti-sense strand of the light strand promoter, a promoter located on the light, or cytosine-rich, strand of mitochondrial DNA." [GOC:mah, PMID:9485316]
  • mitochondrial light strand promoter sense binding - GO:0070363
    def: "Interacting selectively and non-covalently with the sense strand of the light strand promoter, a promoter located on the light, or cytosine-rich, strand of mitochondrial DNA." [GOC:mah, PMID:9485316]
  • serum response element binding - GO:0010736
    def: "Interacting selectively and non-covalently with the serum response element (SRE), a short sequence with dyad symmetry found in the promoters of some of the cellular immediate-early genes, regulated by serum." [GOC:BHF, GOC:dph, GOC:rl, GOC:tb]
  • sterol response element binding - GO:0032810
    def: "Interacting selectively and non-covalently with the sterol response element (SRE), a nonpalindromic sequence found in the promoters of genes involved in lipid metabolism." [GOC:vk, PMID:11994399]
  • vitamin D response element binding - GO:0070644
    def: "Interacting selectively and non-covalently with the vitamin D response element (VDRE), a short sequence with dyad symmetry found in the promoters of some of the cellular immediate-early genes, regulated by serum." [GOC:BHF, GOC:vk, PMID:17426122]

transcription regulator activity - GO:0030528

The term transcription regulator activity (GO:0030528) is the highest level Function term for transcription and it is essentially identical to a Process term. It conveys exactly the same information as the Process term regulation of transcription and it does NOT convey any information about the molecular nature of the regulator activity. In addition, it is grouping the child terms below it based on involvement in a common Process, not based on having a common Function and it currently includes functions based on binding DNA and functions based on interacting with other proteins.

MF: transcription regulator activity - GO:0030528
Current definition: Plays a role in regulating transcription; may bind a promoter or enhancer DNA sequence or interact with a DNA-binding transcription factor.

BP: regulation of transcription - GO:0045449
Current definition: Any process that modulates the frequency, rate or extent of the synthesis of either RNA on a template of DNA or DNA on a template of RNA.

Proposal (changes to and obsoletions of existing terms):

  1. We propose to merge this Function term (GO:0030528) into the equivalent Process term (GO:0045449). [There is precedent for this type of merge with the merge of the Function term splicing factor activity into the equivalent Process term.]
  2. There are three regulation terms in Process based on this term which would need to be dealt with.
    • One option would be to also merge them into the Process terms like regulation of transcription and the appropriate positive and negative regulation terms. Alternately, we could obsolete them.

transcription cofactor activity - GO:0003712, and child terms

The term "transcription cofactor activity" is defined quite specifically:

Definition: "The function that links a sequence-specific transcription factor to the core RNA polymerase II complex but does not bind DNA itself."

However, usage in the literature is broader than this and quite vague as to what is happening. The common elements seem to be:

  • binding to some protein component of the transcription machinery
  • NOT binding to DNA

I talked to a number of people about this. There was agreement that usage of the word "cofactor" is vague in the literature, but strong feeling that there are some types of cofactors where the functional components of the activity can described explicitly so that GO can represent what they do functionally, rather than in terms of process. Thus, I do think we need to attempt to represent functions for cofactors that are understood. However, it is likely that at this point in time, there will be things that are described as "transcription cofactors" in the literature\ , where it is not clear how it is acting at all and it may not be appropriate to make any function annotation at all based on current understanding.

As the current definition (above) is quite specific to a particular combination of binding. Amongst other things, the definitions of this term (and of its child terms "transcription coactivator activity" and "transcription corepressor activity" are all specific to RNA polymerase II, but should not be as the same terms are used for E. coli transcription. This is one type, but not actually broad enough to encompass all the various kinds, so we will probably need a general term, and then probably a few specific subtypes. At this time, we will only add subtypes where \ it is understood how they act, but more subtypes can be added later as understanding advances.

Proposal:

  1. changes to transcription cofactor activity - GO:0003712
    • name change: to "protein-binding transcription cofactor activity"
      to make it clear that you need to be sure that this is a factor that acts by binding proteins in the transcription machinery
    • definition change
      This term has been selected for the prokaryotic GO subset, so it appears that no one has noticed that the definition is specific for a polymerase that is specific to eukaryotes. Thus, I think that broadening the definition to match the term name is likely the appropriate course of action. The other option is to make the term name match the definition, though this has the potential to cause annotations made based on the name incorrect.
      new definition - Interacting selectively and non-covalently with a protein component of the transcription machinery (to regulate transcription). ? for David: This def could definitely use some input. The current one begins with "the function of binding" a wording that I thoroughly despise. I prefer the binding definitions that beging with the "Interacting selectively and non-covalently" wording so I've tried one using that. I'm not sure if we want to include this parenthetical expression.

More issues - not quite ready for prime time:

  1. changes to transcription coactivator activity - GO:0003713
    • The big one is whether we should even represent the positive and negative directions in function. There's a lot of commonality in the function, i.e. binding to something in the transcriptional machinery, regardless of whether it has a positive or negative effect. Basically, I'm wondering if specifying the direction is getting into process, but on the other hand, people might complain too much to eliminate them...
    • definition issue - The current definition is specific to RNAP II, but should be inclusive of prok's
  2. changes to transcription corepressor activity - GO:0003714
    • all the same issues for

RNA polymerase II transcription factor activity - GO:, and similar

other descendents of transcription regulator activity (GO:0030528)

Most of these terms have essentially the same problem as transcription regulator activity (GO:0030528) because their definitions hinge on the definition of transcription regulator activity which cannot be defined in terms of a singular function (see above).

  • transcription activator activity - GO:0016563
    def: "Any transcription regulator activity required for initiation or upregulation of transcription."
  • transcription repressor activity - GO:0016564
    def: "Any transcription regulator activity that prevents or downregulates transcription."
  • specific transcriptional repressor activity - GO:0016566
    def: "Any activity that stops or downregulates transcription of specific genes or sets of genes."
  • basal transcription repressor activity - GO:0017163
    def: "Any transcription regulator activity that prevents or downregulates basal transcription. Basal transcription results from transcription that is controlled by the minimal complement of proteins necessary to reconstitute transcription from a minimal promoter."
  • general transcriptional repressor activity - GO:0016565
    def: "Any activity that stops or downregulates transcription of genes globally, and is not specific to a particular gene or gene set."
  • transcription initiation factor activity - GO:0016986
    def:
  • sigma factor activity - GO:0016987
    def: "A sigma factor is the promoter specificity subunit of eubacterial-type multisubunit RNA polymerases, those whose core subunit composition is often described as alpha(2)-beta-beta-prime. (This type of multisubunit RNA polymerase complex is known to be found in eubacteria and plant plastids). Although sigma does not bind DNA on its own, when combined with the core to form the holoenzyme, this binds specifically to promoter sequences, with the sigma factor making sequence specific contacts with the promoter elements. The sigma subunit is released from the elongating form of the polymerase and is thus free to act catalytically for multiple RNA polymerase core enzymes."
  • mitochondrial transcription initiation factor activity - GO:0034246
    def: "A transcription factor activity that confers promoter specificity upon mitochondrial RNA polymerase, in a manner analogous to eubacterial sigma factors."
  • transcription initiation factor antagonist activity - GO:0016988
    def: "The function of binding to a transcription factor and stopping, preventing or reducing the rate of its transcriptional activity."
  • sigma factor antagonist activity - GO:0016989
    def: "The function of binding to a sigma factor and stopping, preventing or reducing the rate of its transcriptional activity."

Proposal (changes to and obsoletions of existing terms):

  1. obsolete transcription activator activity - GO:0016563, or merge into process term positive regulation of transcription (GO:0045941)
    • As currently defined this is equivalent to the process term positive regulation of transcription (GO:0045941).
    • There may be specific types of activators that can be described in terms of molecular activities, perhaps sequence-specific RNA polymerase II promotor transcription factor site binding involved in activation of transcription (see section on transcription factor activity. However, this term is defined largely on the basis of the effect on the process of transcription and not on the basis of how that effect is accomplished. In addition, since there are numerous ways that activators can function, this term is grouping different functions on the basis of process, so I don't think it is appropriate at a function. Any appropriate individual function terms can be created as types of binding, and then they can have relationships to positive regulation of transcription.
    • We may want to obsolete this term so that we can suggest multiple consider terms, the process term positive regulation of transcription (GO:0045941) as well as any function terms that can be created of the type suggested above.
  2. transcription repressor activity - GO:0016564 and children
    1. obsolete transcription repressor activity - GO:0016564, or merge into process term negative regulation of transcription (GO:0016481)
      • As currently defined this is equivalent to the process term negative regulation of transcription (GO:0016481)
      • I think this term can be dealt with similarly to transcription activator activity above
    2. obsolete specific transcriptional repressor activity - GO:0016566
      • As defined, this term is related to the Process term negative regulation of gene-specific transcription from RNA polymerase II promoter (GO:0010553) and is defined in terms of effect on the process of transcription, rather than how it is accomplished.
      • Similarly to transcription activator activity, this term should be obsoleted and any appropriate individual function terms can be created as types of binding, and then they can have relationships to negative regulation of transcription.
      • Merging might be difficult for this term since there no process term for specific transcription at a general level, only one for gene-specific transcription from RNA polymerase II promoter (GO:0032569)
    3. obsolete basal transcription repressor activity - GO:0017163 & general transcriptional repressor activity - GO:0016565
      • I group these two together because these two terms are identical to each other, as basal and general are two words for the same thing, at least for RNAP II. I'm aware of basal being used for E. coli RNAP where it means the same thing as for RNAP II, but have not encountered general being used in the prokaryotic context. Basal seems to be the preferred word now, over general, in the RNAP II literature, as it has become clear that the "general" factors aren't as general as once thought.
      • Similarly to other terms discussed in this section, e.g. transcription activator activity and specific transcriptional repressor activity, I think this term may represent multiple different types of binding functions, but that it is grouping based on the process, so I recommend obsoleting this term. As with others above, any appropriate individual function terms can be created as types of binding, and then they can have relationships to negative regulation of transcription.
      • Merging might also be difficult for these terms since there no process term for basal or general at a general level, only terms at the level of RNAP II transcription.
  3. obsolete transcription initiation factor activity - GO:0016986, rename and redefine sigma factor activity - GO:0016987, & merge or rename and redefine? mitochondrial transcription initiation factor activity - GO:0034246
    • The first transcription initiation factor activity is undefined but is the parent term of the other two terms (with no other child terms), so I want to consider these three terms together.
    • The first term transcription initiation factor activity should probably be obsoleted because the current name is much too broad and it seems that it might have been broadly used based on the name, which doesn't represent a specific function.
    • The two child terms both represent essentially the same thing, a specificity factor that binds to core RNA polymerase (which is unable to recognize DNA on its own) and while bound to the RNAP confers sequence specific DNA binding. We could represent this as something like core RNA polymerase binding promoter specificity factor activity and give it parentage under both core RNA polymerase binding and under sequence-specific DNA binding. Possibly the existing sigma factor activity term could be redefined to be general, i.e. not specific to sigma factors. [This would be similar to the fact that we currently don't have a specific term for bacterial RNAP; annotation to the general term RNAP is sufficient since bacteria have only one RNAP complex.] While E. coli, and other bacteria, have many different sigma factors, they have only one RNAP, so the primary criterion distinguishing different sigma factors is which promoter sequence they recognize. We feel that GO should not get into the level of detail of describing specific binding sites, so 1 term for this type of activity is sufficient.
    • I am not sure if we need more than just one term for this. This kind of specificity factor exists for both prokaryotes, e.g. E. coli, and also for mitochondrial RNAP. However, I can't think of any examples of cells which have more than one kind of polymerase with this kind of specificity factor, so I think we may only need one term for this type of activity. I guess the argument in favor of keeping this term would be in order to make a link between it and existing process terms for mitochondrial transcription, though at the moment I think it would be an only child as I only know of this type of activity for prokaryotic and for mitochondrial transcription. If we do keep this individual term, then we'll probably want to change the name and def a bit to be consistent with the general term and the relationship with the mitochondrial transcription process term.
  4. obsolete transcription initiation factor antagonist activity - GO:0016988 & rename and redefine sigma factor antagonist activity - GO:0016989
    • These two terms are tied to the item above about sigma factor type specificity factors since these basically represent things that bind to sigma, or comparable, factors to prevent them from binding to core RNAP
    • transcription initiation factor antagonist activity isn't even defined correctly based on the name of the term in that it does not specify binding to an initiation factor. Since this term is badly named and defined, I recommend obsoleting it.
    • The decision of whether we need only a single general term or specific terms that affect regulation of specific types of polymerases should follow the decision made for sigma type specificity factors.
    • The term sigma factor antagonist activity can should be renamed and defined to become a general term for binding to a promoter specificity factor of the sigma type so that it will be broad enough to represent this activity generally and still represent what is known for prokaryotic transcription.
    • I do not know if such an activity is characterized in mitochondrial transcription, so even if we create a specific term for the mitochondrial promoter specificity factor, I don't yet see a need to create a mitochondrial specific term here.