Proposals to overhaul transcription in GO - 2010

From GO Wiki
Revision as of 18:23, 6 July 2010 by Kchris (talk | contribs) (transcription factor activity - GO:0003700)

Jump to: navigation, search

Philosophy of Overhaul

Over the last few years, GO has changied how we talk about and create Function terms so that they now represent how something occurs, e.g. binding, catalytic activities, etc. We are now also avoiding Function terms that duplicate process terms, that is functions that do not describe how the gene product acts but only specify that it is involved in a process. However, despite the fact that we have always said that Function and Process should represent non-overlapping aspects, we have many older terms in Function that essentially duplicate a Process term. Compare, for example, the Function term transcription regulator activity with the Process term regulation of transcription. Both terms essentially mean the same thing. In addition, the Function term transcription regulator activity is not grouping the terms below it on the basis of having similar functions, but rather on the basis of being involved in the same process. This lack of clarity in the distinction between Function and Process generates confusion, both for annotators and for users. One researcher at the meeting told me that she only uses GO occasionally and she can never remember whether the term she wants is in Function or Process.

One of the major goals of this overhaul is to generate clarity between the function terms and the process terms for transcription. We are proposing to eliminate some Function terms that are equivalent to Process terms and which cannot be converted into a description of the molecular activity, or activities, involved. In other cases, we are proposing changes to Function terms so that they actually describe molecular activities.

With respect to annotation, these changes will mean that in cases where the experiments indicate that a gene product is involved in regulating transcription, but give no indication as to how it acts, it would be appropriate to annotate only with a Process term and not with a Function term. With the recently developed method of creating links between Function and Process terms, the old motivations to have terms like transcription regulator activity should be addressed anyway, since terms representing functions involved in regulation of transcription will have a relationship to that Process terms.

Molecular Function

transcription regulator activity - GO:0030528

This is the highest level Function term for transcription and it is essentially identical to a Process term. It conveys exactly the same information as the Process term regulation of transcription and it does NOT convey any information about the molecular nature of the regulator activity. In addition, it is grouping the child terms below it based on involvement in a common Process, not based on having a common Function. We propose to merge this Function term (GO:0030528) into the equivalent Process term (GO:0045449). [There is precedent for this type of merge with the merge of the Function term splicing factor activity into the equivalent Process term.]

transcription regulator activity - GO:0030528
Current definition: Plays a role in regulating transcription; may bind a promoter or enhancer DNA sequence or interact with a DNA-binding transcription factor.

regulation of transcription - GO:0045449
Current definition: Any process that modulates the frequency, rate or extent of the synthesis of either RNA on a template of DNA or DNA on a template of RNA.

transcription factor activity - GO:0003700

The term "transcription factor activity" has parentage under "DNA binding" and was probably really intended to represent the type of txn factor that binds a specific DNA sequence present in a relatively limited set of promoters (as compared to core promoter motifs which are bound by basal factors in many promoters) to activate transcription when the basal factors are not sufficient to drive transcription.

We also need to account for the fact that there are basal transcription factors that bind specific sequences, but that not all basal txn factors bind DNA. This is why we also have terms like 'RNA polymerase II transcription factor activity" that do not have parentage under "DNA binding" and which really mean anything involved in regulating RNAP II transcription, i.e. basically a process definition.

We would like to have function terms that indicate what type of DNA sequence element is being bound, e.g. a basal promoter element versus the binding site for a regulatory transcription factor, such as Gal4 in yeast, that binds a specific sequence that is not a core promoter sequence element. Additionally, we would like to be able to indicate binding to enhancer sites. With that in mind, here is a proposed structure and some specific changes proposed for some of the existing terms.

- sequence specific DNA-binding transcription factor activity - GO:0003700
-- promoter binding - GO:0010843 or GO:new
--- sequence specific core promoter binding  - GO:0010843 or GO:new
---- sequence specific RNA polymerase I core promoter binding
---- sequence specific RNA polymerase II core promoter binding
---- sequence specific RNA polymerase III core promoter binding
--- sequence specific regulatory transcription factor site* binding
---- sequence specific promotor transcription factor site binding
----- sequence specific RNA polymerase I promotor transcription factor site binding
----- sequence specific RNA polymerase II promotor transcription factor site binding
----- sequence specific RNA polymerase III promotor transcription factor site binding
-- sequence specific enhancer transcription factor site binding

changes to existing terms

  1. transcription factor activity - GO:0003700
    • Change name, as suggested above or similar
    • Change position to reflect current definition that this indicates sequence specific binding to DNA. Currently, this term is directly under DNA-binding, but the definition specifies a specific sequence, so we propose to move it
  2. promoter binding - GO:0010843
    • Change either name or definition. Currently this term is defined too narrowly such that it only includes the core promoter elements, while the binding sites for regulatory transcription factors are also considered to be promoter elements. I recommend changing the definition to match the name. The other possibility is changing the name to core promoter binding so that it matches the current definition, but if people have annotated based on the broader term name, annotations will become incorrect with this option.
    • The def should also avoid specifying that the binding sites are for complexes; not all tf's are.