Proposals to overhaul transcription in GO - 2010

From GO Wiki
Revision as of 15:10, 8 July 2010 by Kchris (talk | contribs) (transcription factor activity - GO:0003700)

Jump to: navigation, search

Philosophy of Overhaul

Over the last few years, GO has changied how we talk about and create Function terms so that they now represent how something occurs, e.g. binding, catalytic activities, etc. We are now also avoiding Function terms that duplicate process terms, that is functions that do not describe how the gene product acts but only specify that it is involved in a process. However, despite the fact that we have always said that Function and Process should represent non-overlapping aspects, we have many older terms in Function that essentially duplicate a Process term. Compare, for example, the Function term transcription regulator activity with the Process term regulation of transcription. Both terms essentially mean the same thing. In addition, the Function term transcription regulator activity is not grouping the terms below it on the basis of having similar functions, but rather on the basis of being involved in the same process. This lack of clarity in the distinction between Function and Process generates confusion, both for annotators and for users. One researcher at the meeting told me that she only uses GO occasionally and she can never remember whether the term she wants is in Function or Process.

One of the major goals of this overhaul is to generate clarity between the function terms and the process terms for transcription. We are proposing to eliminate some Function terms that are equivalent to Process terms and which cannot be converted into a description of the molecular activity, or activities, involved. In other cases, we are proposing changes to Function terms so that they actually describe molecular activities.

With respect to annotation, these changes will mean that in cases where the experiments indicate that a gene product is involved in regulating transcription, but give no indication as to how it acts, it would be appropriate to annotate only with a Process term and not with a Function term. With the recently developed method of creating links between Function and Process terms, the old motivations to have terms like transcription regulator activity should be addressed anyway, since terms representing functions involved in regulation of transcription will have a relationship to that Process terms.

Molecular Function

transcription regulator activity - GO:0030528

This is the highest level Function term for transcription and it is essentially identical to a Process term. It conveys exactly the same information as the Process term regulation of transcription and it does NOT convey any information about the molecular nature of the regulator activity. In addition, it is grouping the child terms below it based on involvement in a common Process, not based on having a common Function.

transcription regulator activity - GO:0030528
Current definition: Plays a role in regulating transcription; may bind a promoter or enhancer DNA sequence or interact with a DNA-binding transcription factor.

regulation of transcription - GO:0045449
Current definition: Any process that modulates the frequency, rate or extent of the synthesis of either RNA on a template of DNA or DNA on a template of RNA.

Proposal (change to existing term): We propose to merge this Function term (GO:0030528) into the equivalent Process term (GO:0045449). [There is precedent for this type of merge with the merge of the Function term splicing factor activity into the equivalent Process term.]

transcription factor activity - GO:0003700

The term "transcription factor activity" has parentage under "DNA binding", rather than under "sequence-specific DNA-binding", even though it is defined as binding to a specific DNA sequence. The majority of things in this class are "regulatory" transcription factors which bind a specific DNA sequence present in a relatively limited set of promoters. There are also basal transcription factors which bind to specific core promoter elements in a sequence specific way.

However, we also need to account for the fact that not all basal txn factors bind DNA. Currently, we also have terms like 'RNA polymerase II transcription factor activity" that do not have parentage under "DNA binding" and which really mean anything involved in regulating RNAP II transcription, i.e. basically a process definition. This problem has been reported in SourceForge multiple times because the current structure makes it impossible to indicate that a transcription factor is a sequence-specific DNA binding factor for a specific polymerase; you can either indicate that it binds DNA, or that it is a factor for RNAP I, II, or III, but not that it binds a specific sequence for a specific RNAP.

We would like to have function terms that indicate what type of DNA sequence element is being bound, e.g. a basal promoter element versus the binding site for a regulatory transcription factor, such as Gal4 in yeast. Additionally, we would like to be able to indicate binding to enhancer sites. So the distinction in function will be by the type of DNA site bound.

Proposal: With that in mind, here is a proposed structure (where no number is indicated, the term would be new) and some specific changes proposed for some of the existing terms.

Changes to existing terms:

  1. transcription factor activity - GO:0003700
    • Change name to "sequence specific DNA-binding transcription factor activity", as shown in structure below, or similar
    • Change position to reflect current definition that this indicates sequence specific binding to DNA. Currently, this term is directly under DNA-binding, but the definition specifies a specific sequence, so we propose to move it
  2. promoter binding - GO:0010843
    • Change either name or definition. Currently this term is defined too narrowly such that it only includes the core promoter elements, while the binding sites for regulatory transcription factors are also considered to be promoter elements. We recommend changing the definition to match the name. The other possibility is changing the name to core promoter binding so that it matches the current definition, but if people have annotated based on the broader term name, annotations will become incorrect with this option.
    • The def should also avoid specifying that the binding sites are for complexes; not all transcription factors are.

Structure showing new terms and relationship to existing terms:

- DNA binding - GO:0003677
-- (i) sequence-specific DNA-binding - GO:0043565
--- (i) sequence-specific DNA-binding transcription factor activity - GO:0003700
---- (i) sequence-specific promoter binding - GO:0010843 or GO:new
----- (i) sequence-specific core promoter binding  - GO:0010843 or GO:new
------ (i) sequence-specific RNA polymerase I core promoter binding
------ (i) sequence-specific RNA polymerase II core promoter binding
------ (i) sequence-specific RNA polymerase III core promoter binding
----- (i) sequence-specific regulatory transcription factor site* binding
------ (i) sequence-specific promotor transcription factor site binding
------- (i) sequence-specific RNA polymerase I promotor transcription factor site binding
------- (i) sequence-specific RNA polymerase II promotor transcription factor site binding
------- (i) sequence-specific RNA polymerase III promotor transcription factor site binding
---- (i) specific enhancer transcription factor site binding