The Gene Ontology Consortium is undertaking an effort to provide computable logical definitions for all composite terms in the GO, complementing the text definitions which are currently opaque to computers. Tools will be able to use these logical definitions to provide additional services, such as automatic ontology management, enhanced de-tangled displays and cross-ontology annotation queries.
These logical definitions follow a set pattern. They are also known as genus-differentia definitions (though the word genus is used in the pre-Linnaen sense). To illsutrate, the definition for germ cell migration can be constructed in natural language as:
"A cell migration which has a germ cell as central_participant"
Many of these logical definitions refer to terms from other ontologies - for example, there are many GO terms that refer to cell types. Previously there was no way to obtain OBO Cell ontology (http://www.bioontology.org/wiki/index.php/CL:Main_Page) IDs from GO terms. This work aims to rectify that. We will then move on to other areas.
The plan is to start with Cell and carry on from there. We have already made considerable progress with GO-CL, but there are some challenges there.
The Sequence Ontology already has logical definitions (see genomic entities, below)
The Disease Ontology will include them soon
- 1 Logical Definitions using CellO
- 1.1 Background
- 1.2 Mail List
- 1.3 Current Status
- 1.4 OBO Files
- 1.5 Viewing the logical definitions
- 2 Logical Defintions refering to chemical types
- 3 Logical Definitions refering to anatomical entities
- 4 Other Composite Terms
- 5 Post-coordinating annotations
- 6 Implementation
- 7 FAQ and glossary
Logical Definitions using CellO
(much of what is below is project admin stuff for those closely involved in the process, more details summaries will be written later)
- GO:Obol - Obol
- http://www.bioontology.org/wiki/index.php/XP:Main_Page -- OBO wiki
Received curated feedback from 3rd round. I have now merged the obo files from each round into a single file, made some further edits and pruned a lot of unneccessary anonymous classes
We previously used has_participant (currently in RO). this is not sufficiently specific for our purposes.
For example, if we define cardioblast cell fate specification as: a cell fate specification which has_participant cardioblast, our definition is not sufficiently strict to exclude specification processes in which a cardioblast is participating at the beginning, and is actually fated to become something else.
There was some discussion on the obo-relations mail list, see: http://www.bioontology.org/wiki/index.php/RO:Main_Page
The core relations that have now been introduced are:
An anonymous class is a class without a stable OBO identifier that is created in order to define some other term
Anonymous classes introduce an extra piece of complexity which we would rather avoid. There are 3 sources of anonymous classes in the current logical definitions:
- Terms that need to be added to CL
- Composite terms that may never be added to CL
- High-level process terms
The first category is simple - we can generate a report of all new CL terms we need. An example is immune cell required for definition immune cell activation. For some we'll find CL already contains a suitable term - it's just a matter of finding the CL ID and changing the intersection_of lines. (Changing the names/synonyms to be in sync between GO and CL is optional)
The second category is harder. See http://geneontology.cvs.sourceforge.net/geneontology/go-dev/obol/go-ext/anon-xps.obo?view=log for a list. As an example, we need to define "sperm aster formation" which requies "sperm aster", which if it were to be an existing term, it would be found in the GO cellular component ontology. Adding such terms may explode GO too much (see further on in this document). The current solution is to create an anonymous term "sperm aster" defined as "an aster that is part_of a sperm" (we essentially have a nested defined term).
For many of these, we could flatten the definition to: a formation which has_specific_outcome aster and located_in sperm.
The third category refers to cases like "sperm individualization", where the definition requires a generic high level genus like "individualization" that would only be used for defining a single term. We would end up with a lot of inapproprtiate terms in GO.
For these it may be possible to rewrite the logical definitions [TODO]
All the logical definitions and additional anonymous classes have been collected into a single file, go_xp_cell.obo.
This obo file and supporting files are all in the geneontology.sf.net cvs project in the directory
The files can be downloaded via a browser here:
If you want to access the files via cvs (recommended), follow the instructions here:
Viewing the logical definitions
The logical definitions can be viewed in OBO Edit. You will need to load multiple ontologies at once, see below
Loading and viewing in OBO-Edit
To load any of these in OBO edit, you will need to load:
- go_xp_cell.obo -- http://geneontology.cvs.sourceforge.net/geneontology/go-dev/obol/go-ext/go_xp_cell.obo?view=log
- extra_relations.obo -- http://geneontology.cvs.sourceforge.net/geneontology/go-dev/obol/go-ext/extra_relations?view=log
- cell.obo -- http://obo.sourceforge.net/cgi-bin/detail.cgi?cell
- relationship.obo -- RO -- http://obo.sourceforge.net/relationship/relationship.obo
You will have to use the 'advanced' option in the oboedit load menu. You may have to check 'allow dangling references'
This is a lot of files to load! It has the advantage of keeping everything modular. If you don't want all the bother, the above ontologies have been combined for your convenience, at:
The extra_relations.obo file contain some candidates for inclusion in the obo relations ontology; this list is not stable and will most likely be whittled down to a smaller set, then definitions provided. The final set of extra relations will most likely include a minimal set of extensions to has_participant, see the discussion above.
Always use the latest oboedit version
Make sure you turn the reasoner on (may require lots of memory -- the reasoner isn't essential to browse the logical defs, but it does provide advantages, see below)
Here is a screenshot focused on microglial cell activation
You can see the genus-differentia definition (in the center, in the box marked "cross-products"). The logical definition accessible to the computer is:
- A cell_activation which has_central_participant microglial_cell
(note: screenshot may be out of date and show a different relation used in the differentia)
On the explorer view (left panel) you should be able to see a blue squiggly line linking "microglial cell activation" to "macrophage activation". This link is not actually asserted in the ontology - a curator never made this call. The reasoner has figured out based on the combination of the logical definition, and the is_a link between microglial cell and macrophage in the OBO Cell ontology. This is a sign the curator should either assert this link in GO, work with the Cell curators to fix the corresponding link there, or amend the logical definition.
The oboedit explanation will explain how this is done, although the explanations are not very clear right now.
Viewing in other ontology browsers
For comparison, here is the same thing in SWOOP:
SWOOP is aimed more at computer scientists, hence the logical symbols (see below for how to obtain the OWL to use in SWOOP or Protege-OWL)
Viewing raw obo files
So far we have been manually editing the obo files. We have been interspersing comment: lines in amongst the stanzas
Eventually we will move to a pure oboedit approach (Real Soon Now), but the raw approach is useful as a first pass since we can get chatty in the comments - we lack a good way of linking discussion threads in oboedit right now. The comments will eventually be distilled into a summary document and removed from the final version, since they are mostly of a discursive nature (some important comments may remain)
The files contain a subset of the information in the normal GO file, with the addition of the logical definition, as intersections. Here is a section from the underlying obo 1.2 file
[Term] id: GO:0001774 name: microglial cell activation namespace: biological_process def: "The change in morphology and behavior of a microglial cell resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor." [GOC:mgi_curators, PMID:10626665, PMID:10695728, PMID:12580336, PMID:9893949] is_a: GO:0045321 ! immune cell activation intersection_of: GO:0001775 ! cell activation intersection_of: has_central_participant CL:0000129 ! microglial cell
Important note on the obo file
It is possible to strip the intersection_of lines from the .obo file and have an .obo file that is no different in structure from the go.obo file that has been made available to the public for the last few years. In other words, it is possible to ignore the logical definitions and have the DAG remain intact. This means that the inferrable is_a lines must remain in the public ontology, even though sometimes they seem redundant with the intersection_of lines. This is absolutely crucial for tools that depend on the go.obo file. GO will remain committed to computing the full DAG (also known as "classifying") for the consumers of this file indefinitely.
Viewing the Genus-differentia matrix
The combinations of genus (cell differentiation, development, cell fate commitment, migration) and differentia class (oocyte, neuron, T-cell etc) can be imagined as a 2D matrix (see original Hill paper). Each cell (sensu grid) represents a composite term, with the genus and differentia forming the rows and columns. We have no good way of visualising this yet, we'd like to add this to oboedit and perhaps even AmiGO.
For now you can view an excel file generated from the August version of the logical defs:
Browsing in Obol
The Obol browser can be found on the main Obol pages
find the link to the Obol browser (an AmiGO clone written in Prolog), which may or may not be up and running. Obol may also be loaded with a previous version of the logical definitions.
Obol will allow you to browse existing definitions, in addition Obol can "Obolify" terms without definitions - ie parse the name to get the definition.
Browsing in AmiGO
We are working on allowing the display of genus-differentia definitions to be ready some time after the GO-CL cross-products become part of the live GO. The logical definitions will give us the option of 'disentangling' complex DAGs, showing the CL hierarchy separately, and allow the querying of GO annotations by CL terms.
Screenshots to come - see Obol browser for prototype for now
Logical Defintions refering to chemical types
- cysteine biosynthesis -- a biosynthesis which has_specific_output cysteine
This is the next step. Mike Bada from the Hunter lab has done a lot of preliminary work here
ChEBI must first conform to OBO Foundry principles before it can be used to create logical definitions for GO terms. ChEBI is doing well in a number of respects, the but use of pluralisation as a sole lexical disambiguator between generic forms and specific forms of chemicals violates univocity and will cause may practical problems for the use of these terms in GO definitions. There may be an underlying deeper problem to do with a failure to distinguish types from instances - ontologies should contain only types, never instances.
This has yet to be resolved.
Logical Definitions refering to anatomical entities
We are tackling cells first, see above. We may tackle cell parts (ie GO Cellular Component).
CARO may help
Other Composite Terms
Eventually the entire OBO Foundry will be linked by definitions that span ontological boundaries and levels of granularity
The OBO Disease ontology will define terms using FMA and/or CARO.
- Ovarian Cancer
- Genus: DO:cancer
- Differentia: has_location FMA:Ovary
Obol can be used to retrofit many of these
Parts of specific cell types
- aster of spermatocyte
- plasma membrane of sperm
There is a potential explosion here. Even though certain combinations will be excluded (eg nucleus of erythrocyte), the intersection matrix is dense rather than sparse. When we further combine these with processes (eg sperm plasma membrane assembly) things get unmanageable quickly, even with reasoner support.
Our current recommendation here is to post-coordinate. When these are needed for definitions of other terms, we can coordinate anonymous classes as required (see isssues in CellO section)
Cell types of specific anatomical entities
- forebrain neuron
The combinatorial explosion may not be so large here. Many cell types are restricted to certain anatomical entities (eg Purkinje cells are always in the Cerebellum - see http://www.bioontology.org/wiki/index.php/CL:Aligning_species-specific_anatomy_ontologies_with_CL)
- thoracic bristle (a bristle which is part_of a thorax)
- dorsal fin (a fin which is located dorsally)
- dorsal ectoderm (a region which is located dorsally and a region_of an ectoderm)
Pre-coordination recommended (most AOs do this. The FMA does it extensively. The redundancy can easily be managed with reasoners)
Process and Function
Defining processes by the functionings that are necessarily enacted
Definitions that refer to qualities
This is a more unusual case, but I believe it is useful. PATO does not just have to be used for mutant phenotypes - it can be used for qualities in general)
- diploid cell (a cell which has_quality diploid)
- pluripotent cell (a cell which has_quality pluripotency)
SO already has logical definitions for composite terms http://www.bioontology.org/wiki/index.php/SO:Composite_Terms
These do not reference any ontology external to SO, but some of the definitions may eventually reference PATO
Composite terms in ontologies like MP and plant_trait can be defined using PATO and some ontology of bearer qualities; for example
- hypertrophy of kidney
- Genus: PATO:hypertrophy
- Differentia: inheres_in MA:kidney
See the Obol page for results on plant_trait
- What is the genus: the quality or the bearer entity
- eg is it "hypertrophy of kidney" or "a kidney that is hypertrophied". these terms refer to different but related entities. I believe it is the former and plant_trait has it right
Not all biological entity types need be pre-composed (aka pre-coordinated) in an ontology. Even with reasoner support, this approach is not scalable, and we end up with ICD-9 (can someone add the monkey on a tricycle example here???)
We stress that supporting logical definitions is entirely optional for database administrators, tool implementors, etc. The GOC will continue to provide the full DAG, logical defs can be ignored and tools will work the same. No action is required on the part of databases, organisation and groups that consume the GO, or on the part of their end-users.
However, making tools logical-definition aware can lead to enhancements, and we will provide some support for the technical teams who make use of the GO obo files to populate their databases or extend their tool functionality.
Obo file format implementation
Refer also to obo format 1.2 documentation http://www.geneontology.org/GO.format.obo-1_2.shtml
the oboedit API can also be used to access the logical definitions computationally
Storing logical definitions in a database
Chado and GODB
and the document gmod/schema/chado/modules/cv/doc/cv-advanced-usage.tex, also available from http://gmod.cvs.sourceforge.net/gmod/schema/chado/modules/cv/doc/cv-advanced-usage.tex?view=log[GMOD CVS]
The corresponding APIs need to be extended to fully support this.
GODB and Chado also support post-coordination at the schema level.
We can't support other schemas. However, it is worth noting that OWL-compatible databases (e.g. Sesame+OWLIM, Instancestore) should be capable of representing logical definitions; see OWL below.
OWL and Semantic Web Tools
Many OWL aware tools will do the right thing with the logical definitions. Reasoners like Pellet can be used to compute the subsumption path (we use the oboedit reasoner as it is nicely integrated with the oboedit UI, and is fast because it doesn't have to deal with cases we don't care about).
We support the use of all such 3rd party tools by providing OWL transforms of all obo files
FAQ and glossary
A logical definition, aka cross-product, aka "Aristotelian definition", aka genus-differentia definition, aka necessary and sufficient conditions, aka complete definitions...
This is a definition that can be used by a computer as well as a human. For this project, the logical definition for a specific term always takes the form of a genus (generic term) and diffferentia (discriminating characteristics which mark instances of the specific term as being different from is_a sibling terms)
Obol creates anonymous terms if it can't find existing terms. Sometimes this is a term we need to add, eg to the cell ontology. Sometimes it is a type that we would never create a term for in an ontology but we would like to refer to that class of things.