|
|
(24 intermediate revisions by the same user not shown) |
Line 1: |
Line 1: |
| Currently hosted on:
| | (aka [[TermGenie]]) |
|
| |
|
| http://berkeleybop.org/obo/quickterm/GO
| | Project lead: Chris |
|
| |
|
| == Background ==
| | Coordinates with: ontology-editors |
|
| |
|
| One of the common bottlenecks in annotation is the generation of new
| | == Purpose == |
| ontology classes. The standard workflow calls for the curator to make
| |
| a request for a new class via some kind of tracking system. Ontology
| |
| authors monitor the tracking system, and generate new classes on
| |
| request. The curator then receives a message with the new class
| |
| identifier, which can then be used in annotation. This is inefficient
| |
| due to the lag between request and the generation of a new
| |
| term. Curators can work more efficiently if classes are generated
| |
| instantly on request.
| |
|
| |
|
| Some commonly used ontologies are the Gene Ontology(GO)[REF] and
| | An increasing number of terms coming in from sourceforge are automatable compositional terms - particularly regulation terms. Ontology editors waste time performing manual tasks that could be automated, and curators experience a bottleneck during annotation. |
| phenotype ontologies such as the Mammalian Phenotype (MP)[REF]. These
| |
| ontologies frequently make use of combinatorial classes that conform
| |
| to standard patterns[REF]. We have exploited this characteristic
| |
| feature to devise a new compositional class request system that
| |
| exploits logical reasoning.
| |
|
| |
|
| == Description ==
| | The compositional term request system allows annotators to use a web template system to instantaneously get the term they require, provided it conforms to an existing pre-determined template. A reasoner is used to place the term automatically in the hierarchy. |
|
| |
|
| === User experience ===
| | Documentation for the system is available here: [[TermGenie]] |
|
| |
|
| On entering the
| | == Groups == |
| system, the user is asked to select one a series of pre-defined
| |
| templates. For example, in GO one of the templates is called
| |
| "morphogenesis" and is for generating classes such as "mesonephros
| |
| morphogenesis". For the human phenotype ontology (HPO), one of the
| |
| templates is called "entity_quality" and is for generating classes such as
| |
| "fragmentation of the epiphysis of the thumb".
| |
|
| |
|
| On selecting a template, the user is then asked to fill in one or more
| | * [[Ontology_Development]] |
| slots with classes taken from the appropriate ontology. An AJAX
| |
| auto-completion system is used to assist in the selection of the
| |
| correct term. Some templates require additional information - for
| |
| example, when generating catalytic activity classes or protein complex
| |
| classes, the user has the option of selecting cardinality
| |
| (stoichiometry).
| |
|
| |
|
| The user then has the option of adding additional information,
| | == Project Dependencies == |
| including a preferred label (name), definition, comments and
| |
| definition database cross-reference. These are typically optional, as
| |
| the system can auto-generate these. For some templates, certain pieces
| |
| of information can be mandatory - for example, when generating a
| |
| protein complex class it is mandatory to provide a reference. In some
| |
| cases the user does not have the option of over-riding the
| |
| defaults. For example, for the generation of regulation classes, the
| |
| name is forced to conform to the GO naming convention.
| |
|
| |
|
| After filling in this information, the user can submit the request -
| | * [[Full_Text_Indexing]] (soft dependency) |
| this is done in "dry run" mode unless the user explicitly selects the
| | * [[Transition to OWL]] (for [[#v2]]) |
| commit checkbox. First of all the system will check to see the request
| |
| conforms to the constraints encoded in the template - non-conformant
| |
| requests are rejected.
| |
|
| |
|
| The system will provisionally generate the ontology class
| |
| according to the template. Names, synonyms and a textual definition
| |
| will be generated using naming conventions encoded in the template
| |
| (unless over-ridden by the user). A logical definition is generated,
| |
| which is then used by a reasoner to calculate the correct placement of
| |
| the class in the ontology graph. All of this is directly reported to
| |
| user/submitter.
| |
|
| |
|
| The reasoner is capable of detecting equivalent classes - if this
| | == Status == |
| happens, the annotator is informed that a class with the equivalent
| |
| logical definition already exists. The annotator can then go ahead and
| |
| use that class in their annotation. The system will also check to see
| |
| ensure there is no exact name match to an existing class.
| |
|
| |
|
| If the class is valid and not yet created, and if the user elected to
| | [[#v1]] is available to curators, currently on regulation requests are live |
| commit the request, it does not immediately go into the main ontology
| |
| - instead it goes into a separate "xp submit" ontology. This ontology
| |
| is publicly visible, but is typically only inspected by the main
| |
| gatekeepers of the ontology. After inspection, the gatekeepers can run
| |
| a script that brings the submitted classes into the main ontology. The
| |
| gatekeeper can add extra information, or if they disagree with the
| |
| request then they can choose to obsolete the class. Typically the
| |
| gatekeeper does not have to do much here, as the system takes care of
| |
| most of the details.
| |
|
| |
|
| The curator receives a new GO identifier which they can then
| | * http://berkeleybop.org/obo/quickterm/GO |
| immediately use in annotation. This identifier will appear in the main
| |
| ontology soon after. In the event the request was rejected and the new
| |
| class goes into the ontology as being obsolete, then normal annotation
| |
| lifecycle procedures can be used to fix the annotation.
| |
|
| |
|
| The user then has the opinion of submitting another similar request.
| |
|
| |
|
| === AJAX-based autocompletion === | | == Deliverables == |
|
| |
|
| SETH TO WRITE
| | === v1 === |
|
| |
|
| * lucene + go-moose
| | v1 is obo-format dependent, uses a custom reasoner, and relies on various ad-hoc scripts |
| * javascript library
| |
|
| |
|
| === Flexible template system === | | ==== v1 report ==== |
|
| |
|
| One of the requirements in building this system was that creating and
| | A report on the v1 implementation is available: |
| modifying templates would be fast, efficient and configurable. In
| |
| addition the templates should be understandable by the ontology
| |
| authors, which precludes encoding an imperative language such as perl
| |
| or java.
| |
|
| |
|
| We use a simplified version of Obol grammars[REF] to specify the
| | * [[Compositional Term Submission Tool v1 Report]] |
| templates. Each template has a collection of properties, listed in
| |
| Table 1. The display and behavior of the system is driven entirely by
| |
| the templates, rather than having to be explicitly programmed.
| |
|
| |
|
| === Text generation === | | === v2 === |
|
| |
|
| A simplified Obol grammar is used to specify how to generate
| | v2 will be a port of [[#v1]] |
| names. This consists of a collection of tokens interspersed with
| |
| commands for generation of names, synonyms or definitions. For
| |
| example, the text definition template for development terms is:
| |
|
| |
|
| ['The process whose specific outcome is the progression of',
| | * implemented in Java |
| refname(Structure),' over time, from its formation to the mature structure.',
| | * works off of OBO or OWL files |
| textdef(Structure)]
| | * uses OWLAPIv3 |
| | * runs in TomCat/Jetty |
| | * uses OWL reasoners |
|
| |
|
| The entire template takes a variable called "Structure" (i.e. the
| | Dependencies: [[Transition to OWL]] |
| anatomical entity). The token refname(Structure) is replaced by the
| |
| name of the structure, prefixed by either "a" or "an". The final
| |
| clause recapitulates the definition of the structure.
| |
|
| |
|
| Note that like Obol grammars, these templates can be used for parsing
| | See: [[TermGenie2]] |
| as well as generation.
| |
|
| |
|
| === Reasoning strategy === | | ==== Reasoner Benchmarks ==== |
|
| |
|
| One of the main requirements of the system was for all newly generated
| | Need to test various OWL reasoners, simulate evolution of GO |
| classes to be automatically placed in the ontology. It is important
| |
| for the submitter to be able to see this placement, in order to
| |
| confirm that no mistakes were made. The submitter also needs to
| |
| receive immediate warning if an equivalent class already exists.
| |
|
| |
|
| These tasks can all be done by standard automated reasoners, so long
| | ==== Feasibility Study ==== |
| as logical definitions are supplied in the ontology.
| |
|
| |
|
| We evaluated several reasoners, including OWL reasoners such as
| | Determine feasibility of java version |
| Pellet, FaCT++ and HermiT, as well as the OBO-Edit reasoner. We found
| |
| in all cases that reasoning was either too slow or did not complete
| |
| the reasoning task at all.
| |
|
| |
|
| In order to overcome this obstacle we implemented our own simple
| | [[Category:SWUG Projects]] |
| reasoner on top of SWI-Prolog. This reasoner is not as comprehensive
| |
| as existing OWL reasoners, but is sufficient for the subset of OWL
| |
| used by many existing ontologies such as GO. The only OWL2 constructs
| |
| used by the reasoner are: EquivalentTo (=), SubClassOf (<),
| |
| SubObjectPropertyOf, intersectionOf, someValuesFrom,
| |
| TransitiveProperty and PropertyChain.
| |
| | |
| Rules:
| |
| | |
| <pre>
| |
| X < X
| |
| X < Y if EquivalentTo(X DX) and DX < Y
| |
| X < Y if EquivalentTo(Y DY) and X < DY
| |
| X < intersectionOf(Y1....Yn) if X < Y1 and ... X < Yn
| |
| intersectionOf(X1....Xn) < Y if X in X1...Xn and X < Y
| |
| someValuesFrom(PX X) < someValuesFrom(PY Y) if PX < PY and X < Y
| |
| someValuesFrom(P X) < someValuesFrom(P Y) if Transitive(P) and
| |
| someValuesFrom(P X) < someValuesFrom(PY Z) and someValuesFrom(P Z) < someValuesFrom(P Y)
| |
| someValuesFrom(PX X) < someValuesFrom(PY Y) if PY < PropertyChain(PX PZ)
| |
| someValuesFrom(PX X) < someValuesFrom(PY Z) and someValuesFrom(PZ Z) < someValuesFrom(PY Y)
| |
| </pre>
| |
| | |
| These rules are implemented using a backward-chaining rule engine
| |
| | |
| The performance is generally robust with respect to the size of the input
| |
| ontologies, because axioms that are not relevant to the classification
| |
| of the input submission term are never used.
| |
| | |
| === Workflow ===
| |
| | |
| Most existing ontologies used version control systems such as cvs or
| |
| svn. The submission system is intended to work alongside these - all
| |
| newly generated classes are placed in a version control managed file
| |
| alongside the main ontology file (in GO, this goes in a directory
| |
| called xp_submit).
| |
| | |
| The system actually appends to 3 files
| |
| | |
| * An obo format file consisting of the newly submitted class, together
| |
| with full axioms for the class, including the reasoner-calculated
| |
| superclasses (is_a parents)
| |
| | |
| * A file of new subclass links. This is necessary when new classes are
| |
| inferred to be "sandwiched" between two classes that previously has
| |
| a direct subclass link.
| |
| | |
| * A file of subclass links to be deleted. When a new "sandwich" class
| |
| is created, the previous link becomes redundant. Although these are
| |
| essentially harmless, redundant links can confuse users and it is
| |
| good policy to remove these.
| |
| | |
| These files are visible though the normal mechanisms used by the
| |
| version control system. This means that a "bleeding edge" version of
| |
| the ontology can be viewed by dynamically combining the 3 files above
| |
| plus the main ontology. However, this is typically not required, as
| |
| the gatekeeper can swiftly deal with new requests.
| |
| | |
| The gatekeeper can choose to edit the submission file, but this should
| |
| not be necessary in the majority of cases. Usually it is sufficient to
| |
| quickly inspect the files and to run a merge script to pull in the new
| |
| information from the 3 files above (after this happens, the files
| |
| reset). If desired, even this one minimal manual step can be automated
| |
| (for example, for experienced submitters it may be desirable to
| |
| directly bring in the new submission).
| |
| | |
| === Implementation ===
| |
| | |
| == DISCUSSION ==
| |
| | |
| === Similar systems ===
| |
| | |
| OBI quickterm
| |
| | |
| Text definitions: Rabbit. Robert Stevens' system.
| |
| | |
| https://www.ebi.ac.uk/chebi/submissions/login
| |
| | |
| === Future development ===
| |
| | |
| One of the current limitations of the existing system is that all
| |
| ontologies must be in OBO format, and logical definitions must be
| |
| expressible in the same format. In theory this need not be a problem
| |
| for OWL ontologies that use a restricted set of OWL constructs, but in
| |
| practice the need to convert files places additional administrative
| |
| burdens.
| |
| | |
| It should be relatively easy to convert the system to use OWL
| |
| ontologies rather than OBO ones, and we may do this in future,
| |
| depending on which ontologies use the system.
| |
| | |
| The simplified reasoning strategy may be problematic for some
| |
| ontologies. For example, the cell ontology uses logical definitions
| |
| that require additional constructs including negation that pose
| |
| problems for our backward-chaining reasoning strategy. We expect that
| |
| before long we will be able to use standard OWL reasoners within our
| |
| system. For example, the latest version of the Pellet reasoner has the
| |
| ability to do incremental reasoning with caching of results, which
| |
| eliminates some of the wait time currently associated with OWL
| |
| reasoning. In addition, segmentation strategies such as MIREOT[REF]
| |
| can be used to extract a tractable subset of an ontology.
| |
| | |
| Java conversion.
| |
| | |
| === Non-template classes ===
| |
| | |
| The system is designed specifically for immediate granting of requests
| |
| that follow some compositional template. In principle there is nothing
| |
| preventing the extension of the system to be used for more free-from
| |
| class generation. The submitter would have to manually specify all
| |
| necessary information, rather than have this auto-generated according
| |
| to a template. In practice there is less of a need for this system
| |
| within the GO, as curators can use an ordinary term request system
| |
| such as sourceforge and enter the terms directly using OBO-Edit.
| |
| | |
| | |
| === Current uses ===
| |
| | |
| GO, regulation
| |
| | |
| HPO?
| |
| | |
| == CONCLUSIONS ==
| |
| | |
| The class request bottleneck is a frequent cause of curator
| |
| inefficiency. In addition, the manual construction and placement of
| |
| compositional ontology classes is time-consuming and error-prone. We
| |
| have developed a system that simultaneously deals with both of these
| |
| issues.
| |
| | |
| == AVAILABILITY ==
| |
| | |
| http://berkeleybop.org/obo/quickterm/GO
| |
| | |
| | |
| | |
| == TABLES ==
| |
| | |
| Table 1
| |
| | |
| ontology - the home for the newly generated class
| |
| | |
| description - textual summary of what the template is for
| |
| | |
| externals - external ontologies required to define the class
| |
| | |
| arguments - a list of arguments that must be supplied to the template
| |
| | |
| logical definition - a template for the generation of the logical
| |
| definition.
| |
| | |
| name - a template for generation of the name (preferred label)
| |
| | |
| synonym - a template for generation of synonyms
| |
| | |
| textdef - a template for generation of the textual definition
| |
| | |
| wraps - some templates can optionally wrap other templates.
| |
| | |
| == FIGURES ==
| |
| | |
| Screenshot.
| |
| | |
| Example of submitted class stanza:
| |
| | |
| == REFERENCES ==
| |
| | |
| todo
| |
| | |
| <
| |