Agenda: Coordinating SO and GO for transcription: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(Created page with '= Coordinating GO and SO = The main goal here is to make sure that when GO invokes a sequence based element, say in a GO function term, that we mean the same thing that SO does …')
 
Line 35: Line 35:


= GO issues =
= GO issues =
* represent every type of polymerase that exists in the same cell?
*: We already have process terms for things like "transcription from mitochondrial promoter" and similar terms for transcription done by the other RNAPs (I, II, and III). Different gene products are definitely involved in transcription by these different polymerases.
*: I'm wondering if this leads to having corresponding binding terms for all of these too...
* I have some terms in for transcription cycle in process, and some with links to function
*: This seems to lead to having parallel sets of terms for the transcription cycle for each different type of RNA polymerase found in the same cell, so based on plants, the minimum set would be at least these: RNAP I, RNAP II, RNAP III, RNAP IV mito RNAP, plastid RNAP.

Revision as of 20:07, 10 June 2010

Coordinating GO and SO

The main goal here is to make sure that when GO invokes a sequence based element, say in a GO function term, that we mean the same thing that SO does when it defines that particular sequence element. In looking at SO to make sure that GO is thinking about these sequence elements in the same way, I have come across some things where I would like to represent things slightly differently, i.e. where I would like to suggest changes in SO, some places where I have questions about the goal and scope of SO, and a couple places where I think the current representation in SO is incorrect.

SO issues

Designating core promoter elements

  • I'd like to be able to specifically designate core promoter elements for RNA polymerase II, that are recognized by basal txn factors. A good source is this Thomas and Chiang review.
  • So, I'd like a grouping term for core promoter elements, something like below. This one is specific for RNAP II, but a parent term that would be generally applicable would be appropriate since I think this distinction is also relevant for E. coli.
    > id: tmp:0001001
    > name: RNApol_II_promoter_core_element
    > def: "Characteristic DNA sequences required for proper assembly and orientation of the RNA polymerase II transcription preinitiation complex (PIC) and recognized by basal RNA polymerase II transcription factors." [GOC:krc, PMID:16858867 "Thomas & Chiang 2006"]
    > relationship: part_of SO:0000170 ! RNApol_II_promoter
    I just realized that I've phrased this term as "core promoter element", but given the child terms "part_of" relationships. If this term contains the word "element", then the child terms should probably get is_a relations; alternately, if this term is phrased something like "RNApol_II_core_promoter" without the word element, then the "part_of" relationship probably holds. Not sure which is better or more consistent with SO practice.
  • Some of the existing motifs currently directly under "RNApol_II_promoter" are these core elements, so I'd like to move them to be direct children of a new term for "RNApol_II_core_promoter", or something similar. One, "BRE motif" may merit some changes, since the Thomas & Chiang review specifies two different BRE motifs (upstream and downstream) that are both core elements. There would also be two more elements (BREd and DCE) to add.

True Path Violation regarding TATA-box

Currently both "RNApol_II_promoter" and "RNApol_III_promoter" are related to the term "TATA_box" SO:0000174 via the has_part relationship. This is not true. Both RNAPs II and III have both TATA-containing and TATA-less promoters. It seems like there are a couple ways to deal with this.

  • One would be to make subtypes of TATA-elements based on what kind of promoter they are within, e.g. "TATA_box in RNApol_II_promoter" or however would be the way to phrase that in SO. This would allow appropriate specific kinds of TATA elements to be part of "RNApol_II_core_promoter" or "RNApol_III_promoter_type_3", as appropriate.
  • Alternatively, you could have subclasses of RNAP II promoters that are TATA-containing and TATA-less. It would be true to say that the former has_part "TATA_box", but I'm not sure this is really the way to go. There are several other core promoter elements that are recognized by basal txn factors, and being a core promoter element versus a regulatory txn factor binding site seems to be the more significant distinction. In addition, for RNAP III type 1 and 2 promoters, TATA elements seem to be optional, but their presence or absence does not seem to be a distinguishing characteristic, while TATA sites are a key element of type 3 RNAP III promoters.

Issues relating to "TF_binding_site"

  • The definition may be incorrect. The current def is "A region of a molecule that binds a TF complex [GO:0005667].", but I don't think that all transcription factors are complexes.
  • What is the intended scope of children of this term? Is the goal to list consensus sites of all transcription factors?

detail on promoter types?

How much detail does SO want to cover regarding specific types of promoters? For some RNA polymerases, e.g. euk RNAP II and bacterial RNAP, it seems that the number of types could be from large to enormous. For others, e.g. RNA polymerase I or III, there are one or a small number of common types, though RNAP III may have numerous atypical promoter structures.

  • Regarding "RNApol_III_promoter_type_1", In Sc, only the C box is required, but in Xenopus, there are a couple additional elements, the A box and the intermediate element (IE). [Schramm L, Hernandez N. 2002, PMID:12381659]. If SO is going for comprehensive detail, the A and IE sites should probably be added.
  • Regarding the children of "bacterial_RNApol_promoter" ("minus_10_signal" and "minus_35_signal"), are the -10 and -35 general to all bacterial promoters? I thought they were the hallmarks specifically of sigma70 promoters, particularly since you have specified the actual consensus sequence, and that other sigmas have other recognition sequences.

inconsistency between usage of "TF_binding_site" or "DNA_motif" in parentage?

There are some promoter motifs, e.g. the "A_box" that is part_of "RNApol_III_promoter_type_2" that are is_a children of "TF_binding_site", which is a descendent of "protein_binding_site". Then there are promoter motifs, e.g. "DMv1_motif", that is part_of "RNApol_II_promoter" and is_a children of "DNA_motif".

  • What is the basis for the difference in parentage?
  • Why is "TF_binding_site" not a type of "DNA_motif"?

GO issues

  • represent every type of polymerase that exists in the same cell?
    We already have process terms for things like "transcription from mitochondrial promoter" and similar terms for transcription done by the other RNAPs (I, II, and III). Different gene products are definitely involved in transcription by these different polymerases.
    I'm wondering if this leads to having corresponding binding terms for all of these too...
  • I have some terms in for transcription cycle in process, and some with links to function
    This seems to lead to having parallel sets of terms for the transcription cycle for each different type of RNA polymerase found in the same cell, so based on plants, the minimum set would be at least these: RNAP I, RNAP II, RNAP III, RNAP IV mito RNAP, plastid RNAP.