Proposal for fate of "transcription" and corresponding regulation terms: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 110: Line 110:
</pre>
</pre>


== Annotation counts for '''transcription''' by evidence code ==
== Annotation counts for terms by Source and Evidence code ==
 
'''transcription'''
 
<pre>
<pre>
GO:0006350     transcription   2       IC
GO:0006350 transcription AspGD IEA 2
GO:0006350     transcription   51      IDA
GO:0006350 transcription AspGD IMP 4
GO:0006350     transcription   469711  IEA
GO:0006350 transcription AspGD ISA 1
GO:0006350     transcription   20      IEP
GO:0006350 transcription CGD IEA 6
GO:0006350     transcription   9      IGI
GO:0006350 transcription dictyBase IC 1
GO:0006350     transcription   32      IMP
GO:0006350 transcription dictyBase IEA 29
GO:0006350     transcription   8       IPI
GO:0006350 transcription dictyBase ISS 1
GO:0006350     transcription   6      ISA
GO:0006350 transcription EcoCyc IDA 11
GO:0006350     transcription   26      ISO
GO:0006350 transcription EcoCyc IEA 470
GO:0006350     transcription   273    ISS
GO:0006350 transcription EcoCyc IEP 17
GO:0006350     transcription   22     NAS
GO:0006350 transcription EcoCyc IGI 3
GO:0006350     transcription   12      RCA
GO:0006350 transcription EcoCyc IMP 11
GO:0006350     transcription   240    TAS
GO:0006350 transcription EcoCyc ISS 53
GO:0006350 transcription EcoCyc RCA 1
GO:0006350 transcription Ensembl IEA 24
GO:0006350 transcription FB IDA 7
GO:0006350 transcription FB IEA 11
GO:0006350 transcription FB IEP 2
GO:0006350 transcription FB IMP 3
GO:0006350 transcription FB ISS 13
GO:0006350 transcription FB NAS 8
GO:0006350 transcription FB TAS 4
GO:0006350 transcription GeneDB_Pfalciparum ISS 1
GO:0006350 transcription GeneDB_Spombe IEA 3
GO:0006350 transcription GeneDB_Spombe ISS 5
GO:0006350 transcription GeneDB_Spombe NAS 1
GO:0006350 transcription GeneDB_Tbrucei IDA 1
GO:0006350 transcription GeneDB_Tbrucei ISS 13
GO:0006350 transcription GR_protein IC 1
GO:0006350 transcription GR_protein RCA 3
GO:0006350 transcription JCVI_CMR ISS 54
GO:0006350 transcription MGI IDA 4
GO:0006350 transcription MGI IEA 1594
GO:0006350 transcription MGI IGI 1
GO:0006350 transcription MGI IMP 3
GO:0006350 transcription MGI IPI 1
GO:0006350 transcription MGI ISA 1
GO:0006350 transcription MGI ISO 5
GO:0006350 transcription MGI ISS 1
GO:0006350 transcription MGI NAS 2
GO:0006350 transcription MGI TAS 6
GO:0006350 transcription NCBI ISS 2
GO:0006350 transcription PDB IEA 2653
GO:0006350 transcription PseudoCAP RCA 6
GO:0006350 transcription RefSeq IEA 22
GO:0006350 transcription RGD IEA 43
GO:0006350 transcription RGD ISO 13
GO:0006350 transcription RGD ISS 1
GO:0006350 transcription SGD IDA 6
GO:0006350 transcription SGD IEA 574
GO:0006350 transcription SGD IGI 6
GO:0006350 transcription SGD IMP 8
GO:0006350 transcription SGD IPI 8
GO:0006350 transcription SGD ISA 1
GO:0006350 transcription SGD ISS 3
GO:0006350 transcription SGD RCA 2
GO:0006350 transcription SGD TAS 4
GO:0006350 transcription SGN IEP 1
GO:0006350 transcription TAIR IEA 57
GO:0006350 transcription TAIR ISS 42
GO:0006350 transcription TIGR_CMR ISS 71
GO:0006350 transcription UniProtKB IDA 11
GO:0006350 transcription UniProtKB IEA 467041
GO:0006350 transcription UniProtKB ISS 7
GO:0006350 transcription UniProtKB NAS 7
GO:0006350 transcription UniProtKB TAS 204
GO:0006350 transcription WB IDA 1
GO:0006350 transcription WB IEA 30
GO:0006350 transcription WB IMP 1
GO:0006350 transcription ZFIN IEA 592
</pre>
</pre>

Revision as of 18:11, 17 February 2011

There are a couple problems with the term transcription (GO:0006350), and with its three associated regulation terms:

  • GO:0006350 transcription
  • GO:0045449 regulation of transcription
  • GO:0045941 positive regulation of transcription
  • GO:0016481 negative regulation of transcription

This page contains the 2 possible options for resolution followed by a detailed discussion of the issues and some supporting data.

Proposal

There are 2 possible ways to go with these terms:

  1. Merge the transcription terms into the 4 corresponding transcription, DNA-dependent terms (pairs shown below). Since we think that essentially all of the annotations should have been made to the more granular transcription, DNA-dependent terms, we feel this might be the conservative approach with respect to annotations.
  2. Obsolete the 4 transcription and suggest replacement terms. Since the vast majority of annotations are IEAs, which would be fixed by fixing the mappings rather than by having to manually evaluate each annotation, perhaps the group would prefer to go this way rather than risk inappropriately moving a small number of annotations which should actually be to reverse transcription or transcription, RNA-dependent to the transcription, DNA-dependent term.

Regardless of which option is chosen, we are not planning to create a replacement grouping term to group transcription, DNA-dependent and transcription, RNA-dependent. We feel that the existing term RNA biosynthetic process is sufficient, and unlikely to produce confusion in the way that having a grouping term named transcription has.

Problems

  1. Transcription (GO:0006350) is defined incorrectly.
    • Current definition: The cellular synthesis of either RNA on a template of DNA or DNA on a template of RNA.
    • As written, the definition is specifically grouping "normal" (DNA-dependent) transcription from DNA templates with reverse transcription, which is actually a type of DNA synthesis which should NOT be grouped with transcription at all. It is also excluding production of RNA transcripts from RNA templates, which is a kind of transcription and which occurs in some viruses.
  2. Confusing use in annotations
    • The term transcription (GO:0006350) is being used for annotations as if it were equivalent to transcription, DNA-dependent (GO:0006351), and similarly for the regulation of transcription terms compared with the regulation of transcription, DNA-dependent terms. This is occurring for thousands of annotations, about 20% of the annotations made to transcription or its child terms (numbers from AmiGO below). David and I have scanned a small subset of these annotations and strongly suspect that almost all of these annotations should have been made to the equivalent transcription, DNA-dependent terms, rather than just to transcription.
    • In addition to thousands of annotations to these 4 top level transcription terms, there are over 500 mappings (numbers of dbxref mappings below). These mappings are probably the main source of annotations as the vast majority of annotations to these high level grouping terms are by IEA (annotation counts for "transcription" below). So we will want to consider the fate of the mappings in this decision as well.

Fate of direct child terms of transcription

  • already included in proposal
    - GO:0045449 regulation of transcription
    - GO:0045941 positive regulation of transcription
    - GO:0016481 negative regulation of transcription
    These terms should be handled in the same manner as transcription.
  • input requested
    - GO:0000988 protein binding transcription factor activity
    - GO:0001071 nucleic acid binding transcription factor activity
    These are both new terms that have been added recently as part of the transcription overhaul. I put these in thinking only of DNA-dependent transcription. However, at the moment, nothing about the names or definitions of these terms distinguishes between DNA-dependent or RNA-dependent processes. If anyone knowledgeable about the RNA-dependent process thinks these functions would be needed for the RNA-dependent process, we can make these general terms and create new terms that are subtypes. If no one speaks up though, I might be inclined to just make these terms specific to DNA-dependent transcription since I know almost nothing about RNA-dependent transcription.
  • cross product relationships to be changed
    - GO:0034401 regulation of transcription by chromatin organization
    This term has transcription as part of cross product definition, but you have to have DNA to have chromatin, so this relationship seems like it would be more accurately made to transcription, DNA-dependent
  • new parentage needed
    - GO:0019083 viral transcription
    Similarly to transcription, DNA-dependent, we think that RNA biosynthetic process would be a sufficient is_a parent for this term. It is not currently a parent, so it would need to be added when transcription is removed.
  • no change needed
    - GO:0006351 transcription, DNA-dependent
    This already has an additional is_a parent of RNA biosynthetic process. We feel this is sufficient.
  • to be dealt with separately
    - GO:0006410 transcription, RNA-dependent
    - GO:0032199 transcription involved in RNA-mediated transposition


-Karen & David

Supporting Information

transcription & transcription, DNA-dependent term pairs

GO:0006350 transcription
 => GO:0006351 transcription, DNA-dependent

GO:0045449 regulation of transcription
 => GO:0006355 regulation of transcription, DNA-dependent

GO:0045941 positive regulation of transcription
 => GO:0045893 positive regulation of transcription, DNA-dependent

GO:0016481 negative regulation of transcription
 => GO:0045892 negative regulation of transcription, DNA-dependent

Counts for dbxref mappings to these terms

# of
dbxrefs GO term name
*225    transcription
  43    transcription, DNA-dependent
*231    regulation of transcription
 708    regulation of transcription, DNA-dependent
* 71    positive regulation of transcription
   8    positive regulation of transcription, DNA-dependent
* 57    negative regulation of transcription
  19    negative regulation of transcription, DNA-dependent
1362    Grand Total

* mappings which need to be changed

Gene product Annotation counts from AmiGO

GO:0006350 : transcription [28093 gene products]
- GO:0006351 : transcription, DNA-dependent [21637 gene products]
- GO:0006410 : transcription, RNA-dependent [79 gene products]
- GO:0019083 : viral transcription [265 gene products]

GO:0045449 : regulation of transcription [25298 gene products]
- GO:0006355 : regulation of transcription, DNA-dependent [19684 gene products]
- GO:0046782 : regulation of viral transcription [80 gene products]

GO:0045941 : positive regulation of transcription [3897 gene products]
- GO:0045893 : positive regulation of transcription, DNA-dependent [3099 gene products]
- GO:0050434 : positive regulation of viral transcription [58 gene products]

GO:0016481 : negative regulation of transcription [4008 gene products]
- GO:0075182 : negative regulation of symbiont transcription in response to host [1 gene product]
- GO:0045892 : negative regulation of transcription, DNA-dependent [2852 gene products]
- GO:0032897 : negative regulation of viral transcription [25 gene products]

Annotation counts for terms by Source and Evidence code

transcription

GO:0006350	transcription	AspGD	IEA	2
GO:0006350	transcription	AspGD	IMP	4
GO:0006350	transcription	AspGD	ISA	1
GO:0006350	transcription	CGD	IEA	6
GO:0006350	transcription	dictyBase	IC	1
GO:0006350	transcription	dictyBase	IEA	29
GO:0006350	transcription	dictyBase	ISS	1
GO:0006350	transcription	EcoCyc	IDA	11
GO:0006350	transcription	EcoCyc	IEA	470
GO:0006350	transcription	EcoCyc	IEP	17
GO:0006350	transcription	EcoCyc	IGI	3
GO:0006350	transcription	EcoCyc	IMP	11
GO:0006350	transcription	EcoCyc	ISS	53
GO:0006350	transcription	EcoCyc	RCA	1
GO:0006350	transcription	Ensembl	IEA	24
GO:0006350	transcription	FB	IDA	7
GO:0006350	transcription	FB	IEA	11
GO:0006350	transcription	FB	IEP	2
GO:0006350	transcription	FB	IMP	3
GO:0006350	transcription	FB	ISS	13
GO:0006350	transcription	FB	NAS	8
GO:0006350	transcription	FB	TAS	4
GO:0006350	transcription	GeneDB_Pfalciparum	ISS	1
GO:0006350	transcription	GeneDB_Spombe	IEA	3
GO:0006350	transcription	GeneDB_Spombe	ISS	5
GO:0006350	transcription	GeneDB_Spombe	NAS	1
GO:0006350	transcription	GeneDB_Tbrucei	IDA	1
GO:0006350	transcription	GeneDB_Tbrucei	ISS	13
GO:0006350	transcription	GR_protein	IC	1
GO:0006350	transcription	GR_protein	RCA	3
GO:0006350	transcription	JCVI_CMR	ISS	54
GO:0006350	transcription	MGI	IDA	4
GO:0006350	transcription	MGI	IEA	1594
GO:0006350	transcription	MGI	IGI	1
GO:0006350	transcription	MGI	IMP	3
GO:0006350	transcription	MGI	IPI	1
GO:0006350	transcription	MGI	ISA	1
GO:0006350	transcription	MGI	ISO	5
GO:0006350	transcription	MGI	ISS	1
GO:0006350	transcription	MGI	NAS	2
GO:0006350	transcription	MGI	TAS	6
GO:0006350	transcription	NCBI	ISS	2
GO:0006350	transcription	PDB	IEA	2653
GO:0006350	transcription	PseudoCAP	RCA	6
GO:0006350	transcription	RefSeq	IEA	22
GO:0006350	transcription	RGD	IEA	43
GO:0006350	transcription	RGD	ISO	13
GO:0006350	transcription	RGD	ISS	1
GO:0006350	transcription	SGD	IDA	6
GO:0006350	transcription	SGD	IEA	574
GO:0006350	transcription	SGD	IGI	6
GO:0006350	transcription	SGD	IMP	8
GO:0006350	transcription	SGD	IPI	8
GO:0006350	transcription	SGD	ISA	1
GO:0006350	transcription	SGD	ISS	3
GO:0006350	transcription	SGD	RCA	2
GO:0006350	transcription	SGD	TAS	4
GO:0006350	transcription	SGN	IEP	1
GO:0006350	transcription	TAIR	IEA	57
GO:0006350	transcription	TAIR	ISS	42
GO:0006350	transcription	TIGR_CMR	ISS	71
GO:0006350	transcription	UniProtKB	IDA	11
GO:0006350	transcription	UniProtKB	IEA	467041
GO:0006350	transcription	UniProtKB	ISS	7
GO:0006350	transcription	UniProtKB	NAS	7
GO:0006350	transcription	UniProtKB	TAS	204
GO:0006350	transcription	WB	IDA	1
GO:0006350	transcription	WB	IEA	30
GO:0006350	transcription	WB	IMP	1
GO:0006350	transcription	ZFIN	IEA	592