Obol
Obol is the name of a piece of software for automatically generating cross-product definitions (aka genus-differentia definitions) from the names of terms/classes in OBO ontologies. It relies on (a) a consistent grammatical style employed in naming terms or exact synonyms (b) Consistent naming across ontologies
Obol generates results in .obo format, which can also be converted to owl. The results are in the form of genus-differentia definitions, which can then be fed into a reasoner to check for consistency between and within ontologies, or can be used for cross-ontology queries
Obol will also generate synonyms for terms, based on the synonyms of the cross-product component terms. It will also (to a lesser extent) generate textual definitions, based on the xp definition, and the standard definition forms used in GO
Reference
Christopher J. Mungall Obol: integrating language and meaning in bio-ontologies Comparative and Functional Genomics; Volume 5, Issue 6-7, 2004. Pages 509-520
Methods
Systematically named GO terms can be parsed and generated using context free grammars. For example, the following production rule matches token that can be identified as denoting a cell followed by a token that can be identified as denoting a cellular_component is interpreted as a cellular component
cellular_component --> cell, cellular_component
For example [germ cell], [nucleus]
Obol grammars are definite clause grammars which utilize variables with each symbol. These are denoted by a leading upper-case character. The tokens are parsed as the expression on the left hand side; for example:
cellular_component(CC that part_of(Cell)) --> cell(Cell),cellular_component(CC).
Would parse "germ cell nucleus" as a nucleus that is part_of a germ cell.
Production rules can be combined to parse even complex nested terms such as:
- GO:0051813 active evasion of immune response of other organism via regulation of antigen processing and presentation in other organism during symbiotic interaction
Results
- http://www.berkeleybop.org/obol (This page is not kept up to date)
See also
- http://berkeleybop.org/ontologies#logical_definitions -- many of the XPs here were seeded using obol
Most of the current results are directly integrated into Category:Cross Products .obo files
Grammars
Availability
The current version of obol is a sub-package in the blipkit svn repository
See INSTALL instructions
Previous versions
Obol has changed considerably since the 2004 CFG paper
In the 2004 version, there is a generic english grammar which constructs a syntactic parse tree, and a second transform to translate the syntax tree to a semantic DL-style class expression.
In the current version, there is no generic english grammar. Instead grammars are individually developed and tuned for specific ontologies. The generation of the class expression is folded into the parsing step.
For example previously a parse of "negative regulation of interleukin-1 biosynthesis" would have generated (prep_phrase:of (adj:negative (n:regulation)) ((biosynthesis) ((interleukin) (1))), and this would have been translated to regulation and negatively_regulates (biosynthesis and has_output(il1)
Now this happens more directly via grammar rules such as:
process(regulation and negatively_regulates(P)) --> [negative,regulation,of],process(P) process(biosynthesis and has_output(C)) --> continuant(C),[biosynthesis]
This lessens the need for the obol atomic vocabulary. However, this is still used for mapping relational adjectives to the noun forms, and for term generation.
The version of obol used in the 2004 CFG paper is still available in the obol directory in geneontology cvs on sourceforge. See geneontology sf page. The atomic vocabulary is available in cvs