XP:biological process xp sequence molecule

From GO Wiki
Jump to: navigation, search

This page describes the (ongoing, nascent) work on defining certain BP terms using SO cross-products


Obol was used to generate the initial version of the file. This was then vetted by cjm. Obol grammar: biological_process_xp_sequence_molecule

The current version has many false negatives (ie BP terms that could be defined using SO) since there are many terms obol cannot parse. These can be added later either manually or through refinements to Obol

ontologies used

  • GO
  • SO
  • RO
  • ro_proposed
  • CHEBI (for 3 terms; eg tRNA acetylation = metabolic process that results_in_addition_of acetyl group to tRNA)


See this directory

OBO-Edit users: point OE at the import file.

Casual browsing: see biological_process_xp_sequence_molecule.obo

This will be made available on the [biological_process_xp_sequence_molecule.obo OBO cross-products page] after vetting



62 terms currently have xp defs. This is an underrepresentation. We can add more.

No missing links were found in GO

The meaning of SO terms

For this work we take each SO term as representing a type instantiated by actual molecule and molecule-region instances. These are the actual entities that participate in biological process as represented in GO.

An alternative interpretation of SO is to treat the terms as representing abstract sequences . However, this interpretation renders SO useless for defining concrete terms such as:

id: GO:0030452 ! group I intron catabolic process
intersection_of: GO:0009056 ! catabolic process
intersection_of: OBO_REL:results_in_breakdown_of SO:0000587 ! group_I_intron

There is also some ambiguity as to whether SO terms such as mRNA represent:

  1. a specific kind of RNA molecule
  2. a region of DNA that encodes / acts as template for the construction of a (spatiotemporally distinct) mRNA molecule

It is clear that GO terms such as "mRNA modification" refer to the former and not the latter; thus the file includes:

id: GO:0016556 ! mRNA modification
intersection_of: GO:0008152 ! metabolic process
intersection_of: OBO_REL:results_in_change_to SO:0000234 ! mRNA

However, in SO, mRNA is in the following hierarchy:

./ SO:0000000 Sequence_Ontology
..is_a SO:0000110 sequence_feature [DEF: "An extent of biological sequence."]
...m_of SO:0000704 gene [DEF: "A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript......]
....is_a SO:0000831 gene_member_region [DEF: "A region of a gene."]
.....is_a SO:0000673 transcript [DEF: "An RNA synthesized on a DNA or RNA template by an RNA polymerase."]
......is_a SO:0000233 mature_transcript [DEF: "A transcript which has undergone the necessary modifications, if any, for its function. In eukaryotes this includes, for example, processing of introns, cleavage, base modification, and modifications to the 5' and/or the 3' ends, other than addition of bases. In bacteria functional mRNAs are usually not modified."]
.......is_a SO:0000234 mRNA [DEF: "Messenger RNA is the intermediate molecule between DNA and protein. It includes UTR and coding sequences. It does not contain introns."]

Here m_of is short for SO:member_of. This is undefined, so it's not clear how results_in_change_to composes with this relation. Given the definition of gene_member_region, the implication would be that any instance of mRNA modification necessarily results in change to a gene. This is obviously not the case!

GO/SO synonymy

  • GO: "microRNA biosynthetic processing" SO: has "micro RNA" as synonym, but not "microRNA"

SO/RNAO overlaps

The vast majority of BP_xp_SO are definitions for various kinds of RNA processing, including the full RNA hierarchy: mRNA, snoRNA etc

We use SO terms for these BP xp defs. As the RNAO does not yet exist it's not clear if there is an overlap problem here. Perhaps the RNAO will only include RNA structures. Or perhaps SO represents RNA sequences and not RNA molecules -- in which case, as mentioned, SO is not particularly useful for defining BP terms, in which processes have molecules as participants

SO includes what I think is an error:

./ SO:0000000 Sequence_Ontology ..is_a SO:0000400 sequence_attribute ...is_a SO:0000443 polymer_attribute ....is_a SO:0000348 nucleic_acid .....is_a SO:0000356 RNA


./ SO:0000000 Sequence_Ontology ..is_a SO:0000110 sequence_feature ...is_a SO:0000001 region ....is_a SO:0000831 gene_member_region .....is_a SO:0000673 transcript ......is_a SO:0000233 mature_transcript .......is_a SO:0000234 mRNA .......is_a SO:0000655 ncRNA

surely mRNA, ncRNA is_a RNA??

CHEBI/SO overlaps


Both CHEBI and SO have terms such as inosine, pseudouridine

We defer to CHEBI here; includes or will be included in XP:biological_process_xp_chebi


CHEBI has "peptide".

SO has

 ./ SO:0000000 Sequence_Ontology
 ..is_a SO:0000110 sequence_feature [SYNONYM: "located sequence feature" (related)] [SYNONYM: "located_sequence_feature" (exact)] [DEF: "An extent of biological sequence."]
 ...is_a SO:0000001 region [SYNONYM: "sequence" (exact)] [DEF: "A sequence_feature with an extent greater than zero. A nucleotide region is composed of bases and a polypeptide region is composed of amino acids."]
 ....is_a SO:0000839 polypeptide_region [SYNONYM: "positional" (exact)] [SYNONYM: "positional polypeptide feature" (exact)] [SYNONYM: "region or site annotation" (exact)] [DEF: "Biological sequence region that can be assigned to a specific subsequence of a polypeptide."]
 .....is_a SO:0001063 immature_peptide_region [SYNONYM: "immature_peptide_region" (exact)] [DEF: "An immature_peptide_region is the extent of the peptide after it has been translated and before any processing occurs."]
 ......is_a SO:0000419 mature_protein_region [SYNONYM: "chain" (exact)] [SYNONYM: "mature peptide" (related)] [SYNONYM: "mature_protein_region" (exact)] [DEF: "The extent of a polypeptide chain in the mature protein."]
 .......is_a SO:0001064 active_peptide [SYNONYM: "active_peptide" (exact)] [SYNONYM: "peptide" (exact)] [DEF: "Active peptides are proteins which are biologically active, released from a precursor molecule. Hormones, neuropeptides, antimicrobial peptides, are active peptides. They are typically short (<40 amino acids) in length."]
  • should "peptide" be a synonym for "active_peptide"?
  • what is the relation between SO:active_peptide and CHEBI:peptide?

For defining terms such as:

 GO:0006518 peptide metabolic process [DEF: "The chemical reactions and pathways involving peptides, compounds of two or more amino acids where the alpha carboxyl group of one is bound to the alpha amino group of another."]

A CHEBI term seems more appropriate (??)

Although CHEBI has this:

  • is_a CHEBI:33243 natural product classes
    • is_a CHEBI:16670 peptides [DEF: "Amides derived from two or more amino carboxylic acid molecules (the same or different) by formation of a covalent bond from the carbonyl carbon of one to the nitrogen atom of another with formal loss of water. The term is usually applied to structures formed from alpha-amino acids, but it includes those derived from any amino carboxylic acid."]

what is a "natural product classes"??

For defining the following:

GO:0006465 signal peptide processing [DEF: "The proteolytic removal of a signal peptide from a protein during or after transport to a specific location in the cell."]

We use a SO term:

id: GO:0006465 ! signal peptide processing
intersection_of: GO:0008152 ! metabolic process
intersection_of: OBO_REL:results_in_change_to SO:0000418 ! signal_peptide

Is this totally arbitrary?


There is an overlap between SO and CHEBI here. However, the latter is a little odd:

is_a CHEBI:36976 nucleotides
 is_a CHEBI:15986 polynucleotides
  is_a CHEBI:33696 nucleic acids [DEF: "Macromolecules, the major organic matter of the nuclei of biological cells, made up of nucleotide units, and hydrolysable into certain pyrimidine or purine bases (usually adenine, cytosine, guanine, thymine, uracil), D-ribose or 2-deoxy-D-ribose and phosphoric acid."]
   is_a CHEBI:16991 deoxyribonucleic acids [DEF: "High molecular weight, linear polymers, composed of nucleotides containing deoxyribose and linked by phosphodiester bonds; DNA contain the genetic information of organisms."]
   is_a CHEBI:33697 ribonucleic acids [DEF: "Naturally occurring polyribonucleotides."]
   is_a CHEBI:48010 locked nucleic acids [DEF: "Nucleic acid polymers where the residues contain 'locked' deoxyribose units and are linked by phosphodiester bonds. The deoxyribose unit conformation is 'locked' by a 2'-C,4'-C-oxymethylene link."]
   is_a CHEBI:48015 glycol nucleic acids [DEF: "Nucleic acid polymers where the residues have an acyclic three-carbon propylene glycol phosphodiester backbone."]
   is_a CHEBI:48019 threose nucleic acids [DEF: "Nucleic acids that have threose instead of ribose or deoxyribose in their sugar-phosphate backbones."]
   is_a CHEBI:48021 peptide nucleic acids [DEF: "Nucleic acids where the sugar-phosphate backbone has been replaced by a neutral polyamide backbone such as N-(2-aminoethyl)glycine units."]

RNA is_a nucleotide?

We use SO for all BPs involving RNA

SO/CC overlaps


  • SO:0000577 centromere [DEF: "A region of chromosome where the spindle fibers attach during mitosis and meiosis."]
  • GO:0005698 centromere [DEF: "OBSOLETE. The region of a eukaryotic chromosome that is attached to the spindle during nuclear division. It is defined genetically as the region of the chromosome that always segregates at the first division of meiosis; the region of the chromosome in which no crossing over occurs. At the start of M phase, each chromosome consists of two sister chromatids with a constriction at a point which forms the centromere. During late prophase two kinetochores assemble on each centromere, one kinetochore on each sister chromatid."]
  • GO:0000775 chromosome, pericentric region [DEF: "The central region of a chromosome that includes the centromere and associated proteins. In monocentric chromosomes, this region corresponds to a single area of the chromosome, whereas in holocentric chromosomes, it is evenly distributed along the chromosome."] comment: This term was made obsolete because it is genetically defined region and not a specific subcellular localization.

GO has the following:

GO:0030702 chromatin silencing at centromere 
GO:0031055 chromatin remodeling at centromere 
GO:0031059 histone deacetylation at centromere 
GO:0031066 regulation of histone deacetylation at centromere 
GO:0031067 negative regulation of histone deacetylation at centromere 
GO:0031068 positive regulation of histone deacetylation at centromere 
GO:0034080 DNA replication-independent nucleosome assembly at centromere 
GO:0043505 centromere-specific nucleosome 
GO:0051756 meiotic sister chromatid centromere separation

Should GO use the string "pericentric region" in all primary term names? This would seem too onerous (but then obsoleting "centromere" may have been extreme??)


Similar comments apply

Here there are two GO terms to choose from:

  • GO:0000781 chromosome, telomeric region (related synonym: telomere)
  • GO:0000782 telomere cap complex

The former is what is usually meant in terms with the string "telomere" or "telomeric"

telomere related terms in GO are currently not defined in BP_xp_SO


Both SO and CC have chromosome

  • SO:0000340 chromosome [DEF: "Structural unit composed of a nucleic acid molecule which controls its own replication through the interaction of specific proteins at one or more origins of replication."]
  • GO:0005694 chromosome [DEF: "A structure composed of a very long molecule of DNA and associated proteins (e.g. histones) that carries hereditary information."]

APPROACH: terms such as

  GO:0050000 ! chromosome localization

are defined in XP:biological_process_xp_cellular_component

so we ignore this here

Next steps

  • reach closure on above issues
  • increase coverage: bp_xp_so should have a caretaker and more xps should be added. This will gradually become part of the curation process for GO
  • produce report for Oct GO meeting

Future steps:

  • use reasoning for more advanced tasks beyond filling missing links.