Logical Definitions: Difference between revisions
No edit summary |
|||
Line 16: | Line 16: | ||
The basic methodology is to avoid explicitly managing complex tangled polyhierarchies - instead only manage the hierarchy for the core parts of the ontology and use automated techniques to manage the polyhierarchy when the trees are combined together. | The basic methodology is to avoid explicitly managing complex tangled polyhierarchies - instead only manage the hierarchy for the core parts of the ontology and use automated techniques to manage the polyhierarchy when the trees are combined together. | ||
Alan Rector has written about this extensively, under the heading of "ontology normalization" | |||
If you are mathematically inclined, you may want to read up on [http://en.wikipedia.org/wiki/Formal_concept_analysis Formal Concept Analysis]. | If you are mathematically inclined, you may want to read up on [http://en.wikipedia.org/wiki/Formal_concept_analysis Formal Concept Analysis]. |
Revision as of 20:14, 21 May 2008
The Gene Ontology Consortium is undertaking an effort to provide computable logical definitions for all composite terms in the GO, complementing the text definitions which are currently opaque to computers. Tools will be able to use these logical definitions to provide additional services, such as automatic ontology management, enhanced de-tangled displays and cross-ontology annotation queries.
These logical definitions follow a set pattern. They are also known as genus-differentia definitions (though the word genus is used in the pre-Linnaen sense). Within GO, we also refer to these as [[:Category Cross Products|Cross Products]. To illustrate, the definition for germ cell migration can be constructed in natural language as:
"A cell migration which results_in_the_movement_of a germ cell"
Many of these logical definitions refer to terms from other ontologies - for example, there are many GO terms that refer to cell types. Previously there was no way to obtain OBO Cell ontology (http://www.bioontology.org/wiki/index.php/CL:Main_Page) IDs from GO terms. This work aims to rectify that. We will then move on to other areas.
The plan is to start with Cell and carry on from there. We have already made considerable progress with GO-CL, but there are some challenges there.
The Sequence Ontology already has logical definitions (see genomic entities, below)
Resources
Reading Material
The basic methodology is to avoid explicitly managing complex tangled polyhierarchies - instead only manage the hierarchy for the core parts of the ontology and use automated techniques to manage the polyhierarchy when the trees are combined together.
Alan Rector has written about this extensively, under the heading of "ontology normalization"
If you are mathematically inclined, you may want to read up on Formal Concept Analysis.
Downloading
Vetted:
Unvettted:
Proposed new relations:
Cross products page on biontologies.org:
Those with access to GO CVS can find the current obol results in
go/scratch/obol_results
see also
Mail List
https://lists.sourceforge.net/lists/listinfo/obo-crossproduct
Biological Process and the OBO Cell Ontology
(much of what is below is project admin stuff for those closely involved in the process, more details summaries will be written later)
Current Status
Meetings:
Next Meeting: 2007/07/26
We will discuss an update to the (now out of date) xp defs described, below, and a move to more specific relations as defined in ro_proposed
Tracker Items
A new category has been added to both GO and CL trackers to help organise mutual ontology requests:
- In the GO Tracker, set the Group to be "Cell-XP"
- In the Cell Ontology tracker, set the Group to be GO-Cell-XP
Issues
CL is undergoing a reorganization. However, this can happen in parallel with the xp work
Using the xp files in oboedit
Cross-products are best viewed in oboedit2. See cross-products in OE2
All vetted xp files can be downloaded from
Always use the latest oboedit version
Make sure you turn the reasoner on (may require lots of memory -- the reasoner isn't essential to browse the logical defs, but it does provide advantages, see below)
Here is a screenshot focused on microglial cell activation
You can see the genus-differentia definition (in the center, in the box marked "cross-products"). The logical definition accessible to the computer is:
- A cell_activation which has_central_participant microglial_cell
(note: screenshot may be out of date and show a different relation used in the differentia)
On the explorer view (left panel) you should be able to see a blue squiggly line linking "microglial cell activation" to "macrophage activation". This link is not actually asserted in the ontology - a curator never made this call. The reasoner has figured out based on the combination of the logical definition, and the is_a link between microglial cell and macrophage in the OBO Cell ontology. This is a sign the curator should either assert this link in GO, work with the Cell curators to fix the corresponding link there, or amend the logical definition.
The oboedit explanation will explain how this is done, although the explanations are not very clear right now.
Viewing in other ontology browsers
For comparison, here is the same thing in SWOOP:
SWOOP is aimed more at computer scientists, hence the logical symbols (see below for how to obtain the OWL to use in SWOOP or Protege-OWL)
Viewing raw obo files
So far we have been manually editing the obo files. We have been interspersing comment: lines in amongst the stanzas
Eventually we will move to a pure oboedit approach (Real Soon Now), but the raw approach is useful as a first pass since we can get chatty in the comments - we lack a good way of linking discussion threads in oboedit right now. The comments will eventually be distilled into a summary document and removed from the final version, since they are mostly of a discursive nature (some important comments may remain)
The files contain a subset of the information in the normal GO file, with the addition of the logical definition, as intersections. Here is a section from the underlying obo 1.2 file
[Term] id: GO:0001774 name: microglial cell activation namespace: biological_process def: "The change in morphology and behavior of a microglial cell resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor." [GOC:mgi_curators, PMID:10626665, PMID:10695728, PMID:12580336, PMID:9893949] is_a: GO:0045321 ! immune cell activation intersection_of: GO:0001775 ! cell activation intersection_of: has_central_participant CL:0000129 ! microglial cell
Important note on the obo file
It is possible to strip the intersection_of lines from the .obo file and have an .obo file that is no different in structure from the go.obo file that has been made available to the public for the last few years. In other words, it is possible to ignore the logical definitions and have the DAG remain intact. This means that the inferrable is_a lines must remain in the public ontology, even though sometimes they seem redundant with the intersection_of lines. This is absolutely crucial for tools that depend on the go.obo file. GO will remain committed to computing the full DAG (also known as "classifying") for the consumers of this file indefinitely.
Viewing the Genus-differentia matrix
The combinations of genus (cell differentiation, development, cell fate commitment, migration) and differentia class (oocyte, neuron, T-cell etc) can be imagined as a 2D matrix (see original Hill paper). Each cell (sensu grid) represents a composite term, with the genus and differentia forming the rows and columns. We have no good way of visualising this yet, we'd like to add this to oboedit and perhaps even AmiGO.
For now you can view an excel file generated from the August version of the logical defs:
Browsing in Obol
The Obol browser can be found on the main Obol pages
http://www.berkeleybop.org/obol
find the link to the Obol browser, which may or may not be up and running. Obol may also be loaded with a previous version of the logical definitions.
Obol will allow you to browse existing definitions, in addition Obol can "Obolify" terms without definitions - ie parse the name to get the definition.
Browsing in AmiGO
We are working on allowing the display of genus-differentia definitions to be ready some time after the GO-CL cross-products become part of the live GO. The logical definitions will give us the option of 'disentangling' complex DAGs, showing the CL hierarchy separately, and allow the querying of GO annotations by CL terms.
Screenshots to come - see Obol browser for prototype for now
Biological Process / Molecular function to CHEBI cross products
Examples:
- cysteine biosynthesis -- a biosynthesis which has_specific_output cysteine
This is the next step. Mike Bada from the Hunter lab has done a lot of preliminary work here
Issues
Pluralisation
ChEBI must first conform to OBO Foundry principles before it can be used to create logical definitions for GO terms. ChEBI is doing well in a number of respects, the but use of pluralisation as a sole lexical disambiguator between generic forms and specific forms of chemicals violates univocity and will cause may practical problems for the use of these terms in GO definitions. There may be an underlying deeper problem to do with a failure to distinguish types from instances - ontologies should contain only types, never instances.
This has yet to be resolved.
Functions/Roles and CHEBI
is_a is overloaded in chebi; see:
http://sourceforge.net/tracker/index.php?func=detail&aid=1695784&group_id=36855&atid=440764
Biological Process and Ontologies of gross anatomy
We are tackling cells first, see above. We may tackle cell parts (ie GO Cellular Component).
Issues
Species-specificity
CARO may help
Other Composite Terms
Eventually the entire OBO Foundry will be linked by definitions that span ontological boundaries and levels of granularity
Diseases
The OBO Disease ontology will define terms using FMA and/or CARO.
For example:
- Ovarian Cancer
- Genus: DO:cancer
- Differentia: has_location FMA:Ovary
Obol can be used to retrofit many of these
Issues
species-specificity
Parts of specific cell types
Examples:
- aster of spermatocyte
- plasma membrane of sperm
There is a potential explosion here. Even though certain combinations will be excluded (eg nucleus of erythrocyte), the intersection matrix is dense rather than sparse. When we further combine these with processes (eg sperm plasma membrane assembly) things get unmanageable quickly, even with reasoner support.
Our current recommendation here is to post-coordinate. When these are needed for definitions of other terms, we can coordinate anonymous classes as required (see isssues in CellO section)
Cell types of specific anatomical entities
Examples:
- forebrain neuron
The combinatorial explosion may not be so large here. Many cell types are restricted to certain anatomical entities (eg Purkinje cells are always in the Cerebellum - see http://www.bioontology.org/wiki/index.php/CL:Aligning_species-specific_anatomy_ontologies_with_CL)
Anatomical
Examples:
- thoracic bristle (a bristle which is part_of a thorax)
- dorsal fin (a fin which is located dorsally)
- dorsal ectoderm (a region which is located dorsally and a region_of an ectoderm)
Pre-coordination recommended (most AOs do this. The FMA does it extensively. The redundancy can easily be managed with reasoners)
Process and Function
Defining processes by the functionings that are necessarily enacted
Definitions that refer to qualities
This is a more unusual case, but I believe it is useful. PATO does not just have to be used for mutant phenotypes - it can be used for qualities in general)
Example:
- diploid cell (a cell which has_quality diploid)
- pluripotent cell (a cell which has_quality pluripotency)
Genomic entities
SO already has logical definitions for composite terms http://www.bioontology.org/wiki/index.php/SO:Composite_Terms
These do not reference any ontology external to SO, but some of the definitions may eventually reference PATO
Many GO terms can be defined via SO:
Phenotypes
Composite terms in ontologies like MP and plant_trait can be defined using PATO and some ontology of bearer qualities; for example
- hypertrophy of kidney
- Genus: PATO:hypertrophy
- Differentia: inheres_in MA:kidney
See the Obol page for results on plant_trait
Issues
- What is the genus: the quality or the bearer entity
- eg is it "hypertrophy of kidney" or "a kidney that is hypertrophied". these terms refer to different but related entities. I believe it is the former and plant_trait has it right
- Species-specificity
Post-coordinating annotations
Not all biological entity types need be pre-composed (aka pre-coordinated) in an ontology. Even with reasoner support, this approach is not scalable, and we end up with ICD-9 (can someone add the monkey on a tricycle example here???)
Implementation
We stress that supporting logical definitions is entirely optional for database administrators, tool implementors, etc. The GOC will continue to provide the full DAG, logical defs can be ignored and tools will work the same. No action is required on the part of databases, organisation and groups that consume the GO, or on the part of their end-users.
However, making tools logical-definition aware can lead to enhancements, and we will provide some support for the technical teams who make use of the GO obo files to populate their databases or extend their tool functionality.
Obo file format implementation
Refer also to obo format 1.2 documentation http://www.geneontology.org/GO.format.obo-1_2.shtml
the oboedit API can also be used to access the logical definitions computationally
Storing logical definitions in a database
Chado and GODB
Both Chado and The GO:Database support storing of logical definitions. For details, see XSL Transforms
and the document gmod/schema/chado/modules/cv/doc/cv-advanced-usage.tex, also available from GMOD CVS
The corresponding APIs need to be extended to fully support this.
GODB and Chado also support post-coordination at the schema level.
Other databases
We can't support other schemas. However, it is worth noting that OWL-compatible databases (e.g. Sesame+OWLIM, Instancestore) should be capable of representing logical definitions; see OWL below.
OWL and Semantic Web Tools
Many OWL aware tools will do the right thing with the logical definitions. Reasoners like Pellet can be used to compute the subsumption path (we use the oboedit reasoner as it is nicely integrated with the oboedit UI, and is fast because it doesn't have to deal with cases we don't care about).
We support the use of all such 3rd party tools by providing OWL transforms of all obo files
FAQ and glossary
Logical Definitions
A logical definition, aka cross-product, aka "Aristotelian definition", aka genus-differentia definition, aka necessary and sufficient conditions, aka complete definitions...
This is a definition that can be used by a computer as well as a human. For this project, the logical definition for a specific term always takes the form of a genus (generic term) and diffferentia (discriminating characteristics which mark instances of the specific term as being different from is_a sibling terms)
Anonymous Terms
Obol creates anonymous terms if it can't find existing terms. Sometimes this is a term we need to add, eg to the cell ontology. Sometimes it is a type that we would never create a term for in an ontology but we would like to refer to that class of things.