Agenda: Coordinating SO and GO for transcription

From GO Wiki
Jump to: navigation, search

Date: Friday, June 11th, 2010
Time: 9-10 am Pacific
Present: Karen Christie, Karen Eilbeck, David Hill

Coordinating GO and SO

The main goal here is to make sure that when GO invokes a sequence based element, say in a GO function term, that we mean the same thing that SO does when it defines that particular sequence element. In looking at SO to make sure that GO is thinking about these sequence elements in the same way, I have come across some things where I would like to represent things slightly differently, i.e. where I would like to suggest changes in SO, some places where I have questions about the goal and scope of SO, and a couple places where I think the current representation in SO is incorrect.

Agenda

SO issues

Designating core promoter elements

  • I'd like to be able to specifically designate core promoter elements for RNA polymerase II, that are recognized by basal txn factors. A good source is this Thomas and Chiang review.
  • So, I'd like a grouping term for core promoter elements, something like below. This one is specific for RNAP II, but a parent term that would be generally applicable would be appropriate since I think this distinction is also relevant for E. coli.
    > id: tmp:0001001
    > name: RNApol_II_promoter_core_element
    > def: "Characteristic DNA sequences required for proper assembly and orientation of the RNA polymerase II transcription preinitiation complex (PIC) and recognized by basal RNA polymerase II transcription factors." [GOC:krc, PMID:16858867 "Thomas & Chiang 2006"]
    > relationship: part_of SO:0000170 ! RNApol_II_promoter
    I just realized that I've phrased this term as "core promoter element", but given the child terms "part_of" relationships. If this term contains the word "element", then the child terms should probably get is_a relations; alternately, if this term is phrased something like "RNApol_II_core_promoter" without the word element, then the "part_of" relationship probably holds. Not sure which is better or more consistent with SO practice.
  • Some of the existing motifs currently directly under "RNApol_II_promoter" are these core elements, so I'd like to move them to be direct children of a new term for "RNApol_II_core_promoter", or something similar. One, "BRE motif" may merit some changes, since the Thomas & Chiang review specifies two different BRE motifs (upstream and downstream) that are both core elements. There would also be two more elements (BREd and DCE) to add.

True Path Violation regarding TATA-box

Currently both "RNApol_II_promoter" and "RNApol_III_promoter" are related to the term "TATA_box" SO:0000174 via the has_part relationship. This is not true. Both RNAPs II and III have both TATA-containing and TATA-less promoters. It seems like there are a couple ways to deal with this.

  • One would be to make subtypes of TATA-elements based on what kind of promoter they are within, e.g. "TATA_box in RNApol_II_promoter" or however would be the way to phrase that in SO. This would allow appropriate specific kinds of TATA elements to be part of "RNApol_II_core_promoter" or "RNApol_III_promoter_type_3", as appropriate.
  • Alternatively, you could have subclasses of RNAP II promoters that are TATA-containing and TATA-less. It would be true to say that the former has_part "TATA_box", but I'm not sure this is really the way to go. There are several other core promoter elements that are recognized by basal txn factors, and being a core promoter element versus a regulatory txn factor binding site seems to be the more significant distinction. In addition, for RNAP III type 1 and 2 promoters, TATA elements seem to be optional, but their presence or absence does not seem to be a distinguishing characteristic, while TATA sites are a key element of type 3 RNAP III promoters.

Issues relating to "TF_binding_site"

  • The definition may be incorrect. The current def is "A region of a molecule that binds a TF complex [GO:0005667].", but I don't think that all transcription factors are complexes.
  • What is the intended scope of children of this term? Is the goal to list consensus sites of all transcription factors?

detail on promoter types?

How much detail does SO want to cover regarding specific types of promoters? For some RNA polymerases, e.g. euk RNAP II and bacterial RNAP, it seems that the number of types could be from large to enormous. For others, e.g. RNA polymerase I or III, there are one or a small number of common types, though RNAP III may have numerous atypical promoter structures.

  • Regarding "RNApol_III_promoter_type_1", In Sc, only the C box is required, but in Xenopus, there are a couple additional elements, the A box and the intermediate element (IE). [Schramm L, Hernandez N. 2002, PMID:12381659]. If SO is going for comprehensive detail, the A and IE sites should probably be added.
  • Regarding the children of "bacterial_RNApol_promoter" ("minus_10_signal" and "minus_35_signal"), are the -10 and -35 general to all bacterial promoters? I thought they were the hallmarks specifically of sigma70 promoters, particularly since you have specified the actual consensus sequence, and that other sigmas have other recognition sequences.

inconsistency between usage of "TF_binding_site" or "DNA_motif" in parentage?

There are some promoter motifs, e.g. the "A_box" that is part_of "RNApol_III_promoter_type_2" that are is_a children of "TF_binding_site", which is a descendent of "protein_binding_site". Then there are promoter motifs, e.g. "DMv1_motif", that is part_of "RNApol_II_promoter" and is_a children of "DNA_motif".

  • What is the basis for the difference in parentage?
  • Why is "TF_binding_site" not a type of "DNA_motif"?

GO issues

represent every type of polymerase that exists in the same cell?

  • We already have process terms for things like "transcription from mitochondrial promoter" and similar terms for transcription done by the other RNAPs (I, II, and III). Different gene products are definitely involved in transcription by these different polymerases.
    I'm wondering if this leads to having corresponding binding terms for all of these too...
  • I have some terms in for transcription cycle in process, and some with links to function
    This seems to lead to having parallel sets of terms for the transcription cycle for each different type of RNA polymerase found in the same cell, so based on plants, the minimum set would be at least these: RNAP I, RNAP II, RNAP III, RNAP IV mito RNAP, plastid RNAP.

Discussion

SO issues

Karen E thought Karen C's suggestions were generally sensible and appreciated the effort to keep GO and SO in synch. For specific details of new or changed terms, Karen C will submit SF items and send her SO file to Karen E.

Designating core promoter elements

  • Representing core promoter regions is fine. Regarding the two possible ways to do this, we preferred "RNApol_II_core_promoter" without the word element and giving the elements part_of relationships to that term.

True Path Violation regarding TATA-box

  • Of the two possible suggestions for dealing with this, we all preferred to create terms that represent subtypes of TATA elements, e.g. "TATA_box in RNApol_II_promoter", etc, as appropriate for promoter classes that should have a TATA element as part of that type of region.

Issues relating to "TF_binding_site"

  • Karen C will confirm at the meeting and/or with peole like Jim Hu on the existance of transcription factors that act as single polypeptides rather than complexes.
  • Regarding the scope of this term, it is fine to represent sites in core promoters that represent major classes of promoters. However, there is no intention for SO to have a complete representation of the specific consensus sequences of all regulatory transcription factors.

detail on promoter types?

  • For conserved classes of promoters, e.g. the three types of conserved RNAP III promoters, it is appropriate for SO to represent the sequence elements. Neither GO or SO is interested in trying to represent every little detail of every promoter.
  • Karen C will submit a SF item with some suggested additions.

inconsistency between usage of "TF_binding_site" or "DNA_motif" in parentage?

  • There was agreement that there are some problems here in SO. Karen C will submit a SF item and let Karen E and the SO people figure out what is best with respect to SO practice.

GO issues

represent every type of polymerase that exists in the same cell?

  • Yes, it seems appropriate to represent every type of polymerase that exists in the same cell. For E. coli, we can probably stay at the top level without needing to create specific terms since there is only one RNAP in E. coli.

work in Process, trying to represent the transcription cycle

  • Yes, we will end up duplicating the transcription cycle in process for each different type of RNAP that we instanciate in Function.
  • Karen has some additional specific questions to get input into at the meeting, e.g. is promoter escape part of initiation, elongation, or a step of its own in between the two.