Compositional Term Submission Tool: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
Currently hosted on:
(aka [[TermGenie]])


http://berkeleybop.org/obo/quickterm/GO
Project lead: Chris


== Background ==
Coordinates with: ontology-editors


One of the common bottlenecks in annotation is the generation of new
== Purpose ==
ontology classes. The standard workflow calls for the curator to make
a request for a new class via some kind of tracking system. Ontology
authors monitor the tracking system, and generate new classes on
request. The curator then receives a message with the new class
identifier, which can then be used in annotation. This is inefficient
due to the lag between request and the generation of a new
term. Curators can work more efficiently if classes are generated
instantly on request.


Some commonly used ontologies are the Gene Ontology(GO)[REF] and
An increasing number of terms coming in from sourceforge are automatable compositional terms - particularly regulation terms. Ontology editors waste time performing manual tasks that could be automated, and curators experience a bottleneck during annotation.
phenotype ontologies such as the Mammalian Phenotype (MP)[REF]. These
ontologies frequently make use of combinatorial classes that conform
to standard patterns[REF]. We have exploited this characteristic
feature to devise a new compositional class request system that
exploits logical reasoning.  


== Description ==
The compositional term request system allows annotators to use a web template system to instantaneously get the term they require, provided it conforms to an existing pre-determined template. A reasoner is used to place the term automatically in the hierarchy.


=== User experience ===
Documentation for the system is available here: [[TermGenie]]


On entering the
== Groups ==
system, the user is asked to select one a series of pre-defined
templates. For example, in GO one of the templates is called
"morphogenesis" and is for generating classes such as "mesonephros
morphogenesis". For the human phenotype ontology (HPO), one of the
templates is called "entity_quality" and is for generating classes such as
"fragmentation of the epiphysis of the thumb".


On selecting a template, the user is then asked to fill in one or more
* [[Ontology_Development]]
slots with classes taken from the appropriate ontology. An AJAX
auto-completion system is used to assist in the selection of the
correct term. Some templates require additional information - for
example, when generating catalytic activity classes or protein complex
classes, the user has the option of selecting cardinality
(stoichiometry).


The user then has the option of adding additional information,
== Project Dependencies ==
including a preferred label (name), definition, comments and
definition database cross-reference. These are typically optional, as
the system can auto-generate these. For some templates, certain pieces
of information can be mandatory - for example, when generating a
protein complex class it is mandatory to provide a reference. In some
cases the user does not have the option of over-riding the
defaults. For example, for the generation of regulation classes, the
name is forced to conform to the GO naming convention.


After filling in this information, the user can submit the request -
* [[Full_Text_Indexing]] (soft dependency)
this is done in "dry run" mode unless the user explicitly selects the
* [[Transition to OWL]] (for [[#v2]])
commit checkbox. First of all the system will check to see the request
conforms to the constraints encoded in the template - non-conformant
requests are rejected.


The system will provisionally generate the ontology class
according to the template. Names, synonyms and a textual definition
will be generated using naming conventions encoded in the template
(unless over-ridden by the user). A logical definition is generated,
which is then used by a reasoner to calculate the correct placement of
the class in the ontology graph. All of this is directly reported to
user/submitter.


The reasoner is capable of detecting equivalent classes - if this
== Status ==
happens, the annotator is informed that a class with the equivalent
logical definition already exists. The annotator can then go ahead and
use that class in their annotation. The system will also check to see
ensure there is no exact name match to an existing class.


If the class is valid and not yet created, and if the user elected to
[[#v1]] is available to curators, currently on regulation requests are live
commit the request, it does not immediately go into the main ontology
- instead it goes into a separate "xp submit" ontology. This ontology
is publicly visible, but is typically only inspected by the main
gatekeepers of the ontology. After inspection, the gatekeepers can run
a script that brings the submitted classes into the main ontology. The
gatekeeper can add extra information, or if they disagree with the
request then they can choose to obsolete the class. Typically the
gatekeeper does not have to do much here, as the system takes care of
most of the details.


The curator receives a new GO identifier which they can then
* http://berkeleybop.org/obo/quickterm/GO
immediately use in annotation. This identifier will appear in the main
ontology soon after. In the event the request was rejected and the new
class goes into the ontology as being obsolete, then normal annotation
lifecycle procedures can be used to fix the annotation.


The user then has the opinion of submitting another similar request.


=== AJAX-based autocompletion ===
== Deliverables ==


SETH TO WRITE
=== v1 ===


* lucene + go-moose
v1 is obo-format dependent, uses a custom reasoner, and relies on various ad-hoc scripts
* javascript library


=== Flexible template system ===
==== v1 report ====


One of the requirements in building this system was that creating and
A report on the v1 implementation is available:
modifying templates would be fast, efficient and configurable. In
addition the templates should be understandable by the ontology
authors, which precludes encoding an imperative language such as perl
or java.


We use a simplified version of Obol grammars[REF] to specify the
* [[Compositional Term Submission Tool v1 Report]]
templates. Each template has a collection of properties, listed in
Table 1. The display and behavior of the system is driven entirely by
the templates, rather than having to be explicitly programmed.


=== Text generation ===
=== v2 ===


A simplified Obol grammar is used to specify how to generate
v2 will be a port of [[#v1]]
names. This consists of a collection of tokens interspersed with
commands for generation of names, synonyms or definitions. For
example, the text definition template for development terms is:


  ['The process whose specific outcome is the progression of',
* implemented in Java
  refname(Structure),' over time, from its formation to the mature structure.',
* works off of OBO or OWL files
  textdef(Structure)]
* uses OWLAPIv3
* runs in TomCat/Jetty
* uses OWL reasoners


The entire template takes a variable called "Structure" (i.e. the
Dependencies: [[Transition to OWL]]
anatomical entity). The token refname(Structure) is replaced by the
name of the structure, prefixed by either "a" or "an". The final
clause recapitulates the definition of the structure.


Note that like Obol grammars, these templates can be used for parsing
See: [[TermGenie2]]
as well as generation.


=== Reasoning strategy ===
==== Reasoner Benchmarks ====


One of the main requirements of the system was for all newly generated
Need to test various OWL reasoners, simulate evolution of GO
classes to be automatically placed in the ontology. It is important
for the submitter to be able to see this placement, in order to
confirm that no mistakes were made. The submitter also needs to
receive immediate warning if an equivalent class already exists.


These tasks can all be done by standard automated reasoners, so long
==== Feasibility Study ====
as logical definitions are supplied in the ontology.


We evaluated several reasoners, including OWL reasoners such as
Determine feasibility of java version
Pellet, FaCT++ and HermiT, as well as the OBO-Edit reasoner. We found
in all cases that reasoning was either too slow or did not complete
the reasoning task at all.


In order to overcome this obstacle we implemented our own simple
[[Category:SWUG Projects]]
reasoner on top of SWI-Prolog. This reasoner is not as comprehensive
as existing OWL reasoners, but is sufficient for the subset of OWL
used by many existing ontologies such as GO. The only OWL2 constructs
used by the reasoner are: EquivalentTo (=), SubClassOf (<),
SubObjectPropertyOf, intersectionOf, someValuesFrom,
TransitiveProperty and PropertyChain.
 
Rules:
 
<pre>
X < X
X < Y if EquivalentTo(X DX) and DX < Y
X < Y if EquivalentTo(Y DY) and X < DY
X < intersectionOf(Y1....Yn) if X < Y1 and ... X < Yn
intersectionOf(X1....Xn) < Y if X in X1...Xn and X < Y
someValuesFrom(PX X) < someValuesFrom(PY Y) if PX < PY and X < Y
someValuesFrom(P X) < someValuesFrom(P Y) if Transitive(P) and
  someValuesFrom(P X) < someValuesFrom(PY Z) and someValuesFrom(P Z) < someValuesFrom(P Y)
someValuesFrom(PX X) < someValuesFrom(PY Y) if PY < PropertyChain(PX PZ)
  someValuesFrom(PX X) < someValuesFrom(PY Z) and someValuesFrom(PZ Z) < someValuesFrom(PY Y)
</pre>
 
These rules are implemented using a backward-chaining rule engine
 
The performance is generally robust with respect to the size of the input
ontologies, because axioms that are not relevant to the classification
of the input submission term are never used.
 
=== Workflow ===
 
Most existing ontologies used version control systems such as cvs or
svn. The submission system is intended to work alongside these - all
newly generated classes are placed in a version control managed file
alongside the main ontology file (in GO, this goes in a directory
called xp_submit).
 
The system actually appends to 3 files
 
* An obo format file consisting of the newly submitted class, together
  with full axioms for the class, including the reasoner-calculated
  superclasses (is_a parents)
 
* A file of new subclass links. This is necessary when new classes are
  inferred to be "sandwiched" between two classes that previously has
  a direct subclass link.
 
* A file of subclass links to be deleted. When a new "sandwich" class
  is created, the previous link becomes redundant. Although these are
  essentially harmless, redundant links can confuse users and it is
  good policy to remove these.
 
These files are visible though the normal mechanisms used by the
version control system. This means that a "bleeding edge" version of
the ontology can be viewed by dynamically combining the 3 files above
plus the main ontology. However, this is typically not required, as
the gatekeeper can swiftly deal with new requests.
 
The gatekeeper can choose to edit the submission file, but this should
not be necessary in the majority of cases. Usually it is sufficient to
quickly inspect the files and to run a merge script to pull in the new
information from the 3 files above (after this happens, the files
reset). If desired, even this one minimal manual step can be automated
(for example, for experienced submitters it may be desirable to
directly bring in the new submission).
 
=== Implementation ===
 
== DISCUSSION ==
 
=== Similar systems ===
 
OBI quickterm
 
Text definitions: Rabbit. Robert Stevens' system.
 
https://www.ebi.ac.uk/chebi/submissions/login
 
=== Future development ===
 
One of the current limitations of the existing system is that all
ontologies must be in OBO format, and logical definitions must be
expressible in the same format. In theory this need not be a problem
for OWL ontologies that use a restricted set of OWL constructs, but in
practice the need to convert files places additional administrative
burdens.
 
It should be relatively easy to convert the system to use OWL
ontologies rather than OBO ones, and we may do this in future,
depending on which ontologies use the system.
 
The simplified reasoning strategy may be problematic for some
ontologies. For example, the cell ontology uses logical definitions
that require additional constructs including negation that pose
problems for our backward-chaining reasoning strategy. We expect that
before long we will be able to use standard OWL reasoners within our
system. For example, the latest version of the Pellet reasoner has the
ability to do incremental reasoning with caching of results, which
eliminates some of the wait time currently associated with OWL
reasoning. In addition, segmentation strategies such as MIREOT[REF]
can be used to extract a tractable subset of an ontology.
 
Java conversion.
 
=== Non-template classes ===
 
The system is designed specifically for immediate granting of requests
that follow some compositional template. In principle there is nothing
preventing the extension of the system to be used for more free-from
class generation. The submitter would have to manually specify all
necessary information, rather than have this auto-generated according
to a template. In practice there is less of a need for this system
within the GO, as curators can use an ordinary term request system
such as sourceforge and enter the terms directly using OBO-Edit.
 
 
=== Current uses ===
 
GO, regulation
 
HPO?
 
== CONCLUSIONS ==
 
The class request bottleneck is a frequent cause of curator
inefficiency. In addition, the manual construction and placement of
compositional ontology classes is time-consuming and error-prone. We
have developed a system that simultaneously deals with both of these
issues.
 
== AVAILABILITY ==
 
http://berkeleybop.org/obo/quickterm/GO
 
 
 
== TABLES ==
 
Table 1
 
ontology - the home for the newly generated class
 
description - textual summary of what the template is for
 
externals - external ontologies required to define the class
 
arguments - a list of arguments that must be supplied to the template
 
logical definition - a template for the generation of the logical
definition.
 
name - a template for generation of the name (preferred label)
 
synonym - a template for generation of synonyms
 
textdef - a template for generation of the textual definition
 
wraps - some templates can optionally wrap other templates.
 
== FIGURES ==
 
Screenshot.
 
Example of submitted class stanza:
 
== REFERENCES ==
 
todo
 
<

Latest revision as of 17:27, 14 June 2011

(aka TermGenie)

Project lead: Chris

Coordinates with: ontology-editors

Purpose

An increasing number of terms coming in from sourceforge are automatable compositional terms - particularly regulation terms. Ontology editors waste time performing manual tasks that could be automated, and curators experience a bottleneck during annotation.

The compositional term request system allows annotators to use a web template system to instantaneously get the term they require, provided it conforms to an existing pre-determined template. A reasoner is used to place the term automatically in the hierarchy.

Documentation for the system is available here: TermGenie

Groups

Project Dependencies


Status

#v1 is available to curators, currently on regulation requests are live


Deliverables

v1

v1 is obo-format dependent, uses a custom reasoner, and relies on various ad-hoc scripts

v1 report

A report on the v1 implementation is available:

v2

v2 will be a port of #v1

  • implemented in Java
  • works off of OBO or OWL files
  • uses OWLAPIv3
  • runs in TomCat/Jetty
  • uses OWL reasoners

Dependencies: Transition to OWL

See: TermGenie2

Reasoner Benchmarks

Need to test various OWL reasoners, simulate evolution of GO

Feasibility Study

Determine feasibility of java version