XP:biological process xp sequence molecule
This page describes the (ongoing, nascent) work on defining certain BP terms using SO cross-products
Methodology
Obol was used to generate the initial version of the file. This was then vetted by cjm
The current version has many false negatives (ie BP terms that could be defined using SO) since there are many terms obol cannot parse. These can be added later either manually or through refinements to Obol
ontologies used
- GO
- SO
- RO
- ro_proposed
- CHEBI (for 3 terms; eg tRNA acetylation = metabolic process that results_in_addition_of acetyl group to tRNA)
How to get it
- go/scratch/xps/
OBO-Edit users: point OE at the -imports.obo file.
or just browse
issues raised
The meaning of SO terms
For this work we take each SO term as representing a type instantiated by actual molecule and molecule-region instances. These are the actual entities that participate in biological process as represented in GO.
An alternative interpretation of SO is to treat the terms as representing abstract sequences . However, this interpretation renders SO useless for defining concrete terms such as:
[Term] id: GO:0030452 ! group I intron catabolic process intersection_of: GO:0009056 ! catabolic process intersection_of: OBO_REL:results_in_breakdown_of SO:0000587 ! group_I_intron
There is also some ambiguity as to whether SO terms such as mRNA represent:
- a specific kind of RNA molecule
- a region of DNA that encodes / acts as template for the construction of a (spatiotemporally distinct) mRNA molecule
It is clear that GO terms such as "mRNA modification" refer to the former and not the latter; thus the file includes:
[Term] id: GO:0016556 ! mRNA modification intersection_of: GO:0008152 ! metabolic process intersection_of: OBO_REL:results_in_change_to SO:0000234 ! mRNA
However, in SO, mRNA is in the following hierarchy:
./ SO:0000000 Sequence_Ontology ..is_a SO:0000110 sequence_feature [DEF: "An extent of biological sequence."] ...m_of SO:0000704 gene [DEF: "A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript......] ....is_a SO:0000831 gene_member_region [DEF: "A region of a gene."] .....is_a SO:0000673 transcript [DEF: "An RNA synthesized on a DNA or RNA template by an RNA polymerase."] ......is_a SO:0000233 mature_transcript [DEF: "A transcript which has undergone the necessary modifications, if any, for its function. In eukaryotes this includes, for example, processing of introns, cleavage, base modification, and modifications to the 5' and/or the 3' ends, other than addition of bases. In bacteria functional mRNAs are usually not modified."] .......is_a SO:0000234 mRNA [DEF: "Messenger RNA is the intermediate molecule between DNA and protein. It includes UTR and coding sequences. It does not contain introns."]
Here m_of is short for SO:member_of. This is undefined, so it's not clear how results_in_change_to composes with this relation. Given the definition of gene_member_region, the implication would be that any instance of mRNA modification necessarily results in change to a gene. This is obviously not the case!
SO/RNAO overlaps
The vast majority of BP_xp_SO are definitions for various kinds of RNA processing, including the full RNA hierarchy: mRNA, snoRNA etc
We use SO terms for these BP xp defs. As the RNAO does not yet exist it's not clear if there is an overlap problem here. Perhaps the RNAO will only include RNA structures. Or perhaps SO represents RNA sequences and not RNA molecules -- in which case, as mentioned, SO is not particularly useful for defining BP terms, in which processes have molecules as participants
CHEBI/SO overlaps
Bases
- inosine
etc
- peptide (SO has active_peptide as an EXACT synonym. mistake?)
SO/CC overlaps
Centromere
- SO:0000577 centromere [DEF: "A region of chromosome where the spindle fibers attach during mitosis and meiosis."]
- GO:0005698 centromere [DEF: "OBSOLETE. The region of a eukaryotic chromosome that is attached to the spindle during nuclear division. It is defined genetically as the region of the chromosome that always segregates at the first division of meiosis; the region of the chromosome in which no crossing over occurs. At the start of M phase, each chromosome consists of two sister chromatids with a constriction at a point which forms the centromere. During late prophase two kinetochores assemble on each centromere, one kinetochore on each sister chromatid."]
- GO:0000775 chromosome, pericentric region [DEF: "The central region of a chromosome that includes the centromere and associated proteins. In monocentric chromosomes, this region corresponds to a single area of the chromosome, whereas in holocentric chromosomes, it is evenly distributed along the chromosome."] comment: This term was made obsolete because it is genetically defined region and not a specific subcellular localization.
GO has the following:
GO:0030702 chromatin silencing at centromere GO:0031055 chromatin remodeling at centromere GO:0031059 histone deacetylation at centromere GO:0031066 regulation of histone deacetylation at centromere GO:0031067 negative regulation of histone deacetylation at centromere GO:0031068 positive regulation of histone deacetylation at centromere GO:0034080 DNA replication-independent nucleosome assembly at centromere GO:0043505 centromere-specific nucleosome GO:0051756 meiotic sister chromatid centromere separation
Should GO use the string "pericentric region" in all primary term names? This would seem too onerous (but then obsoleting "centromere" may have been extreme??)
Telomere
Similar comments apply
Here there are two GO terms to choose from:
- GO:0000781 chromosome, telomeric region (related synonym: telomere)
- GO:0000782 telomere cap complex
The former is what is usually meant in terms with the string "telomere" or "telomeric"
telomere related terms in GO are currently not defined in BP_xp_SO
Chromosome
Both SO and CC have chromosome
- SO:0000340 chromosome [DEF: "Structural unit composed of a nucleic acid molecule which controls its own replication through the interaction of specific proteins at one or more origins of replication."]
- GO:0005694 chromosome [DEF: "A structure composed of a very long molecule of DNA and associated proteins (e.g. histones) that carries hereditary information."]
APPROACH: terms such as
GO:0050000 ! chromosome localization
are defined in XP:biological_process_xp_cellular_component
so we ignore this here