Obol

From GO Wiki
Jump to: navigation, search

Obol is the name of a piece of software for automatically generating cross-product definitions (aka genus-differentia definitions) from the names of terms/classes in OBO ontologies. It relies on (a) a consistent grammatical style employed in naming terms or exact synonyms (b) Consistent naming across ontologies

Obol generates results in .obo format, which can also be converted to owl. The results are in the form of genus-differentia definitions, which can then be fed into a reasoner to check for consistency between and within ontologies, or can be used for cross-ontology queries

Obol will also generate synonyms for terms, based on the synonyms of the cross-product component terms. It will also (to a lesser extent) generate textual definitions, based on the xp definition, and the standard definition forms used in GO

Reference

Christopher J. Mungall Obol: integrating language and meaning in bio-ontologies Comparative and Functional Genomics; Volume 5, Issue 6-7, 2004. Pages 509-520

Methods

Systematically named GO terms can be parsed and generated using context free grammars. For example, the following production rule matches token that can be identified as denoting a cell followed by a token that can be identified as denoting a cellular_component is interpreted as a cellular component

  cellular_component --> cell, cellular_component

For example [germ cell], [nucleus]

Obol grammars are definite clause grammars which utilize variables with each symbol. These are denoted by a leading upper-case character. The tokens are parsed as the expression on the left hand side; for example:

 cellular_component(CC that part_of(Cell)) --> cell(Cell),cellular_component(CC).

Would parse "germ cell nucleus" as a nucleus that is part_of a germ cell.

Production rules can be combined to parse even complex nested terms such as:

  • GO:0051813 active evasion of immune response of other organism via regulation of antigen processing and presentation in other organism during symbiotic interaction

Results

See also

Most of the current results are directly integrated into Category:Cross Products .obo files

Grammars

XPs Prolog Grammar
XP:cellular_component_xp_self grammar
XP:cellular_component_xp_go

grammar grammar

XP:cellular_component_xp_cell

grammar

XP:biological_process_xp_self

grammar

XP:biological_process_xp_cellular_component

grammar

XP:biological_process_xp_molecular_function

grammar

XP:biological_process_xp_cell

grammar

XP:biological_process_xp_chebi

grammar

XP:biological_process_xp_anatomy

grammar

XP:biological_process_xp_stimulus

grammar

XP:biological_process_xp_multi_organism_process

grammar

XP:molecular_function_xp_chebi

grammar

XP:molecular_function_xp_cellular_component

grammar

Availability

The current version of obol is a sub-package in the blipkit svn repository

See INSTALL instructions

Previous versions

Obol has changed considerably since the 2004 CFG paper

In the 2004 version, there is a generic english grammar which constructs a syntactic parse tree, and a second transform to translate the syntax tree to a semantic DL-style class expression.

In the current version, there is no generic english grammar. Instead grammars are individually developed and tuned for specific ontologies. The generation of the class expression is folded into the parsing step.

For example previously a parse of "negative regulation of interleukin-1 biosynthesis" would have generated (prep_phrase:of (adj:negative (n:regulation)) ((biosynthesis) ((interleukin) (1))), and this would have been translated to regulation and negatively_regulates (biosynthesis and has_output(il1)

Now this happens more directly via grammar rules such as:

process(regulation and negatively_regulates(P)) --> [negative,regulation,of],process(P)
process(biosynthesis and has_output(C)) --> continuant(C),[biosynthesis]

This lessens the need for the obol atomic vocabulary. However, this is still used for mapping relational adjectives to the noun forms, and for term generation.

The version of obol used in the 2004 CFG paper is still available in the obol directory in geneontology cvs on sourceforge. See geneontology sf page. The atomic vocabulary is available in cvs