Logical Definitions

From GO Wiki
Jump to: navigation, search

The Gene Ontology Consortium is undertaking an effort to provide computable logical definitions for all composite terms in the GO, complementing the text definitions which are currently opaque to computers. Tools will be able to use these logical definitions to provide additional services, such as automatic ontology management, enhanced de-tangled displays and cross-ontology annotation queries.

These logical definitions follow a set pattern. They are also known as genus-differentia definitions (though the word genus is used in the pre-Linnaen sense). Within GO, we also refer to these as Cross Products. To illustrate, the definition for germ cell migration can be constructed in natural language as:

"A cell migration which results_in_the_movement_of a germ cell"

Many of these logical definitions refer to terms from other ontologies - for example, there are many GO terms that refer to cell types. Previously there was no way to obtain OBO Cell ontology (http://www.bioontology.org/wiki/index.php/CL:Main_Page) IDs from GO terms. This work aims to rectify that. We will then move on to other areas.

The plan is to start with Cell and carry on from there. We have already made considerable progress with GO-CL, but there are some challenges there.

The Sequence Ontology already has logical definitions (see genomic entities, below)

Resources

Reading Material

The basic methodology is to avoid explicitly managing complex tangled polyhierarchies - instead only manage the hierarchy for the core parts of the ontology and use automated techniques to manage the polyhierarchy when the trees are combined together.

Alan Rector has written about this extensively, under the heading of "ontology normalization"

If you are mathematically inclined, you may want to read up on Formal Concept Analysis - see for example this page

Downloading

Vetted:

Unvettted:

Proposed new relations:

Cross products page on biontologies.org:

Those with access to GO CVS can find the current obol results in

 go/scratch/obol_results

see also

Mail List

https://lists.sourceforge.net/lists/listinfo/obo-crossproduct

Biological Process and the OBO Cell Ontology

(much of what is below is project admin stuff for those closely involved in the process, more details summaries will be written later)

Current Status

Meetings:

Next Meeting: 2007/07/26

We will discuss an update to the (now out of date) xp defs described, below, and a move to more specific relations as defined in ro_proposed

Tracker Items

A new category has been added to both GO and CL trackers to help organise mutual ontology requests:

Issues

CL is undergoing a reorganization. However, this can happen in parallel with the xp work

Using the xp files in oboedit

Cross-products are best viewed in oboedit2. See cross-products in OE2

All vetted xp files can be downloaded from

Always use the latest oboedit version

Make sure you turn the reasoner on (may require lots of memory -- the reasoner isn't essential to browse the logical defs, but it does provide advantages, see below)

Here is a screenshot focused on microglial cell activation

Oboedit-microglial.jpg

source: http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/obol/go-ext/docs/oboedit-microglial.jpg

You can see the genus-differentia definition (in the center, in the box marked "cross-products"). The logical definition accessible to the computer is:

- A cell_activation which has_central_participant microglial_cell

(note: screenshot may be out of date and show a different relation used in the differentia)

On the explorer view (left panel) you should be able to see a blue squiggly line linking "microglial cell activation" to "macrophage activation". This link is not actually asserted in the ontology - a curator never made this call. The reasoner has figured out based on the combination of the logical definition, and the is_a link between microglial cell and macrophage in the OBO Cell ontology. This is a sign the curator should either assert this link in GO, work with the Cell curators to fix the corresponding link there, or amend the logical definition.

The oboedit explanation will explain how this is done, although the explanations are not very clear right now.

Viewing in other ontology browsers

For comparison, here is the same thing in SWOOP:

Swoop-microglial.jpg

source: http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/obol/go-ext/docs/swoop-microglial.jpg

SWOOP is aimed more at computer scientists, hence the logical symbols (see below for how to obtain the OWL to use in SWOOP or Protege-OWL)

Viewing raw obo files

So far we have been manually editing the obo files. We have been interspersing comment: lines in amongst the stanzas

Eventually we will move to a pure oboedit approach (Real Soon Now), but the raw approach is useful as a first pass since we can get chatty in the comments - we lack a good way of linking discussion threads in oboedit right now. The comments will eventually be distilled into a summary document and removed from the final version, since they are mostly of a discursive nature (some important comments may remain)

The files contain a subset of the information in the normal GO file, with the addition of the logical definition, as intersections. Here is a section from the underlying obo 1.2 file

 [Term]
 id: GO:0001774
 name: microglial cell activation
 namespace: biological_process
 def: "The change in morphology and behavior of a microglial cell resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor." [GOC:mgi_curators, PMID:10626665, PMID:10695728, PMID:12580336, PMID:9893949]
 is_a: GO:0045321 ! immune cell activation
 intersection_of: GO:0001775     ! cell activation
 intersection_of: has_central_participant CL:0000129     ! microglial cell

Important note on the obo file

It is possible to strip the intersection_of lines from the .obo file and have an .obo file that is no different in structure from the go.obo file that has been made available to the public for the last few years. In other words, it is possible to ignore the logical definitions and have the DAG remain intact. This means that the inferrable is_a lines must remain in the public ontology, even though sometimes they seem redundant with the intersection_of lines. This is absolutely crucial for tools that depend on the go.obo file. GO will remain committed to computing the full DAG (also known as "classifying") for the consumers of this file indefinitely.

Viewing the Genus-differentia matrix

The combinations of genus (cell differentiation, development, cell fate commitment, migration) and differentia class (oocyte, neuron, T-cell etc) can be imagined as a 2D matrix (see original Hill paper). Each cell (sensu grid) represents a composite term, with the genus and differentia forming the rows and columns. We have no good way of visualising this yet, we'd like to add this to oboedit and perhaps even AmiGO.

For now you can view an excel file generated from the August version of the logical defs:

http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/obol/go-ext/docs/matrix-microglial.xls

Browsing in Obol

The Obol browser can be found on the main Obol pages

http://www.berkeleybop.org/obol

find the link to the Obol browser, which may or may not be up and running. Obol may also be loaded with a previous version of the logical definitions.

Obol will allow you to browse existing definitions, in addition Obol can "Obolify" terms without definitions - ie parse the name to get the definition.

Browsing in AmiGO

We are working on allowing the display of genus-differentia definitions to be ready some time after the GO-CL cross-products become part of the live GO. The logical definitions will give us the option of 'disentangling' complex DAGs, showing the CL hierarchy separately, and allow the querying of GO annotations by CL terms.

Screenshots to come - see Obol browser for prototype for now

Biological Process / Molecular function to CHEBI cross products

Examples:

cysteine biosynthesis -- a biosynthesis which has_specific_output cysteine

This is the next step. Mike Bada from the Hunter lab has done a lot of preliminary work here

Issues

Pluralisation

ChEBI must first conform to OBO Foundry principles before it can be used to create logical definitions for GO terms. ChEBI is doing well in a number of respects, the but use of pluralisation as a sole lexical disambiguator between generic forms and specific forms of chemicals violates univocity and will cause may practical problems for the use of these terms in GO definitions. There may be an underlying deeper problem to do with a failure to distinguish types from instances - ontologies should contain only types, never instances.

This has yet to be resolved.

Functions/Roles and CHEBI

is_a is overloaded in chebi; see:

http://sourceforge.net/tracker/index.php?func=detail&aid=1695784&group_id=36855&atid=440764

Biological Process and Ontologies of gross anatomy

We are tackling cells first, see above. We may tackle cell parts (ie GO Cellular Component).

Issues

Species-specificity

CARO may help

Other Composite Terms

Eventually the entire OBO Foundry will be linked by definitions that span ontological boundaries and levels of granularity

Diseases

The OBO Disease ontology will define terms using FMA and/or CARO.

For example:

Ovarian Cancer
Genus: DO:cancer
Differentia: has_location FMA:Ovary

Obol can be used to retrofit many of these

Issues

species-specificity

Parts of specific cell types

Examples:

  • aster of spermatocyte
  • plasma membrane of sperm

There is a potential explosion here. Even though certain combinations will be excluded (eg nucleus of erythrocyte), the intersection matrix is dense rather than sparse. When we further combine these with processes (eg sperm plasma membrane assembly) things get unmanageable quickly, even with reasoner support.

Our current recommendation here is to post-coordinate. When these are needed for definitions of other terms, we can coordinate anonymous classes as required (see isssues in CellO section)

Cell types of specific anatomical entities

Examples:

  • forebrain neuron

The combinatorial explosion may not be so large here. Many cell types are restricted to certain anatomical entities (eg Purkinje cells are always in the Cerebellum - see http://www.bioontology.org/wiki/index.php/CL:Aligning_species-specific_anatomy_ontologies_with_CL)

Anatomical

Examples:

  • thoracic bristle (a bristle which is part_of a thorax)
  • dorsal fin (a fin which is located dorsally)
  • dorsal ectoderm (a region which is located dorsally and a region_of an ectoderm)

Pre-coordination recommended (most AOs do this. The FMA does it extensively. The redundancy can easily be managed with reasoners)

Process and Function

Defining processes by the functionings that are necessarily enacted

Definitions that refer to qualities

This is a more unusual case, but I believe it is useful. PATO does not just have to be used for mutant phenotypes - it can be used for qualities in general)

Example:

  • diploid cell (a cell which has_quality diploid)
  • pluripotent cell (a cell which has_quality pluripotency)

Genomic entities

SO already has logical definitions for composite terms http://www.bioontology.org/wiki/index.php/SO:Composite_Terms

These do not reference any ontology external to SO, but some of the definitions may eventually reference PATO

Many GO terms can be defined via SO:

Phenotypes

Composite terms in ontologies like MP and plant_trait can be defined using PATO and some ontology of bearer qualities; for example

hypertrophy of kidney
Genus: PATO:hypertrophy
Differentia: inheres_in MA:kidney

See the Obol page for results on plant_trait

Issues

  • What is the genus: the quality or the bearer entity
    • eg is it "hypertrophy of kidney" or "a kidney that is hypertrophied". these terms refer to different but related entities. I believe it is the former and plant_trait has it right
  • Species-specificity

Post-coordinating annotations

Not all biological entity types need be pre-composed (aka pre-coordinated) in an ontology. Even with reasoner support, this approach is not scalable, and we end up with ICD-9 (can someone add the monkey on a tricycle example here???)

Implementation

We stress that supporting logical definitions is entirely optional for database administrators, tool implementors, etc. The GOC will continue to provide the full DAG, logical defs can be ignored and tools will work the same. No action is required on the part of databases, organisation and groups that consume the GO, or on the part of their end-users.

However, making tools logical-definition aware can lead to enhancements, and we will provide some support for the technical teams who make use of the GO obo files to populate their databases or extend their tool functionality.

Obo file format implementation

Refer also to obo format 1.2 documentation http://www.geneontology.org/GO.format.obo-1_2.shtml

the oboedit API can also be used to access the logical definitions computationally

Storing logical definitions in a database

Chado and GODB

Both Chado and The GO:Database support storing of logical definitions. For details, see XSL Transforms

and the document gmod/schema/chado/modules/cv/doc/cv-advanced-usage.tex, also available from GMOD CVS

The corresponding APIs need to be extended to fully support this.

GODB and Chado also support post-coordination at the schema level.

Other databases

We can't support other schemas. However, it is worth noting that OWL-compatible databases (e.g. Sesame+OWLIM, Instancestore) should be capable of representing logical definitions; see OWL below.

OWL and Semantic Web Tools

Many OWL aware tools will do the right thing with the logical definitions. Reasoners like Pellet can be used to compute the subsumption path (we use the oboedit reasoner as it is nicely integrated with the oboedit UI, and is fast because it doesn't have to deal with cases we don't care about).

We support the use of all such 3rd party tools by providing OWL transforms of all obo files

FAQ and glossary

Logical Definitions

A logical definition, aka cross-product, aka "Aristotelian definition", aka genus-differentia definition, aka necessary and sufficient conditions, aka complete definitions...

This is a definition that can be used by a computer as well as a human. For this project, the logical definition for a specific term always takes the form of a genus (generic term) and diffferentia (discriminating characteristics which mark instances of the specific term as being different from is_a sibling terms)

Anonymous Terms

Obol creates anonymous terms if it can't find existing terms. Sometimes this is a term we need to add, eg to the cell ontology. Sometimes it is a type that we would never create a term for in an ontology but we would like to refer to that class of things.