Logical Definitions

From GO Wiki
Revision as of 10:32, 7 January 2007 by Cjm (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The Gene Ontology Consortium is undertaking an effort to provide computable logical definitions for all composite terms in the GO, complementing the text definitions which are currently opaque to computers. Tools will be able to use these logical definitions to provide additional services, such as automatic ontology management, enhanced de-tangled displays and cross-ontology annotation queries.

These logical definitions follow a set pattern. They are also known as genus-differentia definitions (though the word genus is used in the pre-Linnaen sense). To illsutrate, the definition for germ cell migration can be constructed in natural language as:

"A cell migration which has a germ cell as central_participant"

Many of these logical definitions refer to terms from other ontologies - for example, there are many GO terms that refer to cell types. Previously there was no way to obtain OBO Cell ontology (http://www.bioontology.org/wiki/index.php/CL:Main_Page) IDs from GO terms. This work aims to rectify that. We will then move on to other areas.

The plan is to start with Cell and carry on from there. We have already made considerable progress with GO-CL, but there are some challenges there.

The Sequence Ontology already has logical definitions (see genomic entities, below)

The Disease Ontology will include them soon

Logical Definitions using CellO

(much of what is below is project admin stuff for those closely involved in the process, more details summaries will be written later)

Background

See http://www.fruitfly.org/~cjm/obol

GO:Obol - Obol
http://www.bioontology.org/wiki/index.php/XP:Main_Page -- OBO wiki

Mail List

https://lists.sourceforge.net/lists/listinfo/obo-crossproduct

Current Status

2006/08/25

Received curated feedback from 3rd round. I have now merged the obo files from each round into a single file, made some further edits and pruned a lot of unneccessary anonymous classes

Outstanding Issues

Relations

We previously used has_participant (currently in RO). this is not sufficiently specific for our purposes.

For example, if we define cardioblast cell fate specification as: a cell fate specification which has_participant cardioblast, our definition is not sufficiently strict to exclude specification processes in which a cardioblast is participating at the beginning, and is actually fated to become something else.

There was some discussion on the obo-relations mail list, see: http://www.bioontology.org/wiki/index.php/RO:Main_Page

The core relations that have now been introduced are:

  • has_central_participant
  • has_specific_outcome
  • acts_on_population

definition: ...

Anonymous classes

An anonymous class is a class without a stable OBO identifier that is created in order to define some other term

Anonymous classes introduce an extra piece of complexity which we would rather avoid. There are 3 sources of anonymous classes in the current logical definitions:

  • Terms that need to be added to CL
  • Composite terms that may never be added to CL
  • High-level process terms

The first category is simple - we can generate a report of all new CL terms we need. An example is immune cell required for definition immune cell activation. For some we'll find CL already contains a suitable term - it's just a matter of finding the CL ID and changing the intersection_of lines. (Changing the names/synonyms to be in sync between GO and CL is optional)

The second category is harder. See http://geneontology.cvs.sourceforge.net/geneontology/go-dev/obol/go-ext/anon-xps.obo?view=log for a list. As an example, we need to define "sperm aster formation" which requies "sperm aster", which if it were to be an existing term, it would be found in the GO cellular component ontology. Adding such terms may explode GO too much (see further on in this document). The current solution is to create an anonymous term "sperm aster" defined as "an aster that is part_of a sperm" (we essentially have a nested defined term).

For many of these, we could flatten the definition to: a formation which has_specific_outcome aster and located_in sperm.

The third category refers to cases like "sperm individualization", where the definition requires a generic high level genus like "individualization" that would only be used for defining a single term. We would end up with a lot of inapproprtiate terms in GO.

For these it may be possible to rewrite the logical definitions [TODO]

Other issues

TODO: summary

OBO Files

All the logical definitions and additional anonymous classes have been collected into a single file, go_xp_cell.obo.


This obo file and supporting files are all in the geneontology.sf.net cvs project in the directory

 cvs/go-dev/obol/go-ext

The files can be downloaded via a browser here:

http://geneontology.cvs.sourceforge.net/geneontology/go-dev/obol/go-ext/

If you want to access the files via cvs (recommended), follow the instructions here:

http://sourceforge.net/cvs/?group_id=36855

Viewing the logical definitions

The logical definitions can be viewed in OBO Edit. You will need to load multiple ontologies at once, see below

Loading and viewing in OBO-Edit

To load any of these in OBO edit, you will need to load:

go_xp_cell.obo -- http://geneontology.cvs.sourceforge.net/geneontology/go-dev/obol/go-ext/go_xp_cell.obo?view=log
extra_relations.obo -- http://geneontology.cvs.sourceforge.net/geneontology/go-dev/obol/go-ext/extra_relations?view=log
cell.obo -- http://obo.sourceforge.net/cgi-bin/detail.cgi?cell
relationship.obo -- RO -- http://obo.sourceforge.net/relationship/relationship.obo
gene_ontology.obo

You will have to use the 'advanced' option in the oboedit load menu. You may have to check 'allow dangling references'

This is a lot of files to load! It has the advantage of keeping everything modular. If you don't want all the bother, the above ontologies have been combined for your convenience, at:

http://geneontology.cvs.sourceforge.net/geneontology/go-dev/obol/go-ext/ego_full.obo?view=log

The extra_relations.obo file contain some candidates for inclusion in the obo relations ontology; this list is not stable and will most likely be whittled down to a smaller set, then definitions provided. The final set of extra relations will most likely include a minimal set of extensions to has_participant, see the discussion above.

Always use the latest oboedit version

Make sure you turn the reasoner on (may require lots of memory -- the reasoner isn't essential to browse the logical defs, but it does provide advantages, see below)

Here is a screenshot focused on microglial cell activation

http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/obol/go-ext/docs/oboedit-microglial.jpg

You can see the genus-differentia definition (in the center, in the box marked "cross-products"). The logical definition accessible to the computer is:

- A cell_activation which has_central_participant microglial_cell

(note: screenshot may be out of date and show a different relation used in the differentia)

On the explorer view (left panel) you should be able to see a blue squiggly line linking "microglial cell activation" to "macrophage activation". This link is not actually asserted in the ontology - a curator never made this call. The reasoner has figured out based on the combination of the logical definition, and the is_a link between microglial cell and macrophage in the OBO Cell ontology. This is a sign the curator should either assert this link in GO, work with the Cell curators to fix the corresponding link there, or amend the logical definition.

The oboedit explanation will explain how this is done, although the explanations are not very clear right now.

Viewing in other ontology browsers

For comparison, here is the same thing in SWOOP:

http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/obol/go-ext/docs/swoop-microglial.jpg

SWOOP is aimed more at computer scientists, hence the logical symbols (see below for how to obtain the OWL to use in SWOOP or Protege-OWL)

Viewing raw obo files

So far we have been manually editing the obo files. We have been interspersing comment: lines in amongst the stanzas

Eventually we will move to a pure oboedit approach (Real Soon Now), but the raw approach is useful as a first pass since we can get chatty in the comments - we lack a good way of linking discussion threads in oboedit right now. The comments will eventually be distilled into a summary document and removed from the final version, since they are mostly of a discursive nature (some important comments may remain)

The files contain a subset of the information in the normal GO file, with the addition of the logical definition, as intersections. Here is a section from the underlying obo 1.2 file

 [Term]
 id: GO:0001774
 name: microglial cell activation
 namespace: biological_process
 def: "The change in morphology and behavior of a microglial cell resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor." [GOC:mgi_curators, PMID:10626665, PMID:10695728, PMID:12580336, PMID:9893949]
 is_a: GO:0045321 ! immune cell activation
 intersection_of: GO:0001775     ! cell activation
 intersection_of: has_central_participant CL:0000129     ! microglial cell

Important note on the obo file

It is possible to strip the intersection_of lines from the .obo file and have an .obo file that is no different in structure from the go.obo file that has been made available to the public for the last few years. In other words, it is possible to ignore the logical definitions and have the DAG remain intact. This means that the inferrable is_a lines must remain in the public ontology, even though sometimes they seem redundant with the intersection_of lines. This is absolutely crucial for tools that depend on the go.obo file. GO will remain committed to computing the full DAG (also known as "classifying") for the consumers of this file indefinitely.

Viewing the Genus-differentia matrix

The combinations of genus (cell differentiation, development, cell fate commitment, migration) and differentia class (oocyte, neuron, T-cell etc) can be imagined as a 2D matrix (see original Hill paper). Each cell (sensu grid) represents a composite term, with the genus and differentia forming the rows and columns. We have no good way of visualising this yet, we'd like to add this to oboedit and perhaps even AmiGO.

For now you can view an excel file generated from the August version of the logical defs:

http://geneontology.cvs.sourceforge.net/*checkout*/geneontology/go-dev/obol/go-ext/docs/matrix-microglial.xls

Browsing in Obol

The Obol browser can be found on the main Obol pages

http://www.fruitfly.org/~cjm/obol

find the link to the Obol browser (an AmiGO clone written in Prolog), which may or may not be up and running. Obol may also be loaded with a previous version of the logical definitions.

Obol will allow you to browse existing definitions, in addition Obol can "Obolify" terms without definitions - ie parse the name to get the definition.

Browsing in AmiGO

We are working on allowing the display of genus-differentia definitions to be ready some time after the GO-CL cross-products become part of the live GO. The logical definitions will give us the option of 'disentangling' complex DAGs, showing the CL hierarchy separately, and allow the querying of GO annotations by CL terms.

Screenshots to come - see Obol browser for prototype for now

Logical Defintions refering to chemical types

Examples:

cysteine biosynthesis -- a biosynthesis which has_specific_output cysteine

This is the next step. Mike Bada from the Hunter lab has done a lot of preliminary work here

Issues

Pluralisation

ChEBI must first conform to OBO Foundry principles before it can be used to create logical definitions for GO terms. ChEBI is doing well in a number of respects, the but use of pluralisation as a sole lexical disambiguator between generic forms and specific forms of chemicals violates univocity and will cause may practical problems for the use of these terms in GO definitions. There may be an underlying deeper problem to do with a failure to distinguish types from instances - ontologies should contain only types, never instances.

This has yet to be resolved.

Logical Definitions refering to anatomical entities

We are tackling cells first, see above. We may tackle cell parts (ie GO Cellular Component).

Issues

Species-specificity

CARO may help

Other Composite Terms

Eventually the entire OBO Foundry will be linked by definitions that span ontological boundaries and levels of granularity

Diseases

The OBO Disease ontology will define terms using FMA and/or CARO.

For example:

Ovarian Cancer
Genus: DO:cancer
Differentia: has_location FMA:Ovary

Obol can be used to retrofit many of these

Issues

species-specificity

Parts of specific cell types

Examples:

  • aster of spermatocyte
  • plasma membrane of sperm

There is a potential explosion here. Even though certain combinations will be excluded (eg nucleus of erythrocyte), the intersection matrix is dense rather than sparse. When we further combine these with processes (eg sperm plasma membrane assembly) things get unmanageable quickly, even with reasoner support.

Our current recommendation here is to post-coordinate. When these are needed for definitions of other terms, we can coordinate anonymous classes as required (see isssues in CellO section)

Cell types of specific anatomical entities

Examples:

  • forebrain neuron

The combinatorial explosion may not be so large here. Many cell types are restricted to certain anatomical entities (eg Purkinje cells are always in the Cerebellum - see http://www.bioontology.org/wiki/index.php/CL:Aligning_species-specific_anatomy_ontologies_with_CL)

Anatomical

Examples:

  • thoracic bristle (a bristle which is part_of a thorax)
  • dorsal fin (a fin which is located dorsally)
  • dorsal ectoderm (a region which is located dorsally and a region_of an ectoderm)

Pre-coordination recommended (most AOs do this. The FMA does it extensively. The redundancy can easily be managed with reasoners)

Process and Function

Defining processes by the functionings that are necessarily enacted

Definitions that refer to qualities

This is a more unusual case, but I believe it is useful. PATO does not just have to be used for mutant phenotypes - it can be used for qualities in general)

Example:

  • diploid cell (a cell which has_quality diploid)
  • pluripotent cell (a cell which has_quality pluripotency)

Genomic entities

SO already has logical definitions for composite terms http://www.bioontology.org/wiki/index.php/SO:Composite_Terms

These do not reference any ontology external to SO, but some of the definitions may eventually reference PATO

Phenotypes

Composite terms in ontologies like MP and plant_trait can be defined using PATO and some ontology of bearer qualities; for example

hypertrophy of kidney
Genus: PATO:hypertrophy
Differentia: inheres_in MA:kidney

See the Obol page for results on plant_trait

Issues

  • What is the genus: the quality or the bearer entity
    • eg is it "hypertrophy of kidney" or "a kidney that is hypertrophied". these terms refer to different but related entities. I believe it is the former and plant_trait has it right
  • Species-specificity

Post-coordinating annotations

Not all biological entity types need be pre-composed (aka pre-coordinated) in an ontology. Even with reasoner support, this approach is not scalable, and we end up with ICD-9 (can someone add the monkey on a tricycle example here???)

Implementation

We stress that supporting logical definitions is entirely optional for database administrators, tool implementors, etc. The GOC will continue to provide the full DAG, logical defs can be ignored and tools will work the same. No action is required on the part of databases, organisation and groups that consume the GO, or on the part of their end-users.

However, making tools logical-definition aware can lead to enhancements, and we will provide some support for the technical teams who make use of the GO obo files to populate their databases or extend their tool functionality.

Obo file format implementation

Refer also to obo format 1.2 documentation http://www.geneontology.org/GO.format.obo-1_2.shtml

the oboedit API can also be used to access the logical definitions computationally

Storing logical definitions in a database

Chado and GODB

Both Chado and The GO:Database support storing of logical definitions. For details, see http://www.godatabase.org/dev/xml/xsl[XSL Transforms]

and the document gmod/schema/chado/modules/cv/doc/cv-advanced-usage.tex, also available from http://gmod.cvs.sourceforge.net/gmod/schema/chado/modules/cv/doc/cv-advanced-usage.tex?view=log[GMOD CVS]

The corresponding APIs need to be extended to fully support this.

GODB and Chado also support post-coordination at the schema level.

Other databases

We can't support other schemas. However, it is worth noting that OWL-compatible databases (e.g. Sesame+OWLIM, Instancestore) should be capable of representing logical definitions; see OWL below.

OWL and Semantic Web Tools

Many OWL aware tools will do the right thing with the logical definitions. Reasoners like Pellet can be used to compute the subsumption path (we use the oboedit reasoner as it is nicely integrated with the oboedit UI, and is fast because it doesn't have to deal with cases we don't care about).

We support the use of all such 3rd party tools by providing OWL transforms of all obo files

FAQ and glossary

Logical Definitions

A logical definition, aka cross-product, aka "Aristotelian definition", aka genus-differentia definition, aka necessary and sufficient conditions, aka complete definitions...

This is a definition that can be used by a computer as well as a human. For this project, the logical definition for a specific term always takes the form of a genus (generic term) and diffferentia (discriminating characteristics which mark instances of the specific term as being different from is_a sibling terms)

Anonymous Terms

Obol creates anonymous terms if it can't find existing terms. Sometimes this is a term we need to add, eg to the cell ontology. Sometimes it is a type that we would never create a term for in an ontology but we would like to refer to that class of things.