OBO-Edit: Reasoner Overview

From GO Wiki
Jump to navigation Jump to search

Here's my more detailed breakdown of the proposed reasoner improvements, and a discussion of what OBO-Edit's reasoner does now...

Part I - What it does now...

In the abstract, the reasoner is really simple. All it does is examine the ontology and use simple rules to generate new links that are implied by the existing links in the ontology. Once the reasoner generates a bunch of links, it iterates again, taking into account the new links generated by the previous runs of the reasoner. Eventually, the reasoner will iterate and discover no new links, at which point the reasoner stops running.

Whenever a new link is generated, OBO-Edit records the rule that was used to generate the new link, and which existing links constituted evidence for the generation of the new link. If you select a link and then look at the Explanation Plugin, the Explanation Plugin will give you this information.

The database of generated links is HUGE, and full of things you don't want to see. For example, if A -is_a-> B and B -is_a-> C, the reasoner will go ahead and create the link A -is_a-> C. Users don't want to see millions of links like that. So the database of generated links is trimmed by an algorithm that throws out links users don't want to see (don't worry; it never throws away anything remotely interesting to a person).

Right now, OBO-Edit's reasoner knows 4 rules that it uses to generate implied links:

  1. Genus Implications - This is a very simple rule. If a term has a genus listed in the Cross Products tab of the text editor, create an is_a link between the term and its genus.
  2. Differentia implications - Another very simple rule. If a term has a differentia listed in the Cross Products tab of the text editor, create a link of the appropriate type between the term and its genus.
  3. Transitivity implications - If p is a transitive relationship type, and A -p-> B, and B -p-> C, generate the implied link A -p-> C . This rule is actually a lot more complex, because it also encompasses the fact that all relationship types are transitive over is_a (if B -p-> C and A -is_a-> B, A -p-> C). Because the reasoner does multiple passes over the ontology, this will generate a LOT of new links.
  4. Cross product implications - This is the really interesting one. It uses cross product definitions to generate new is_a links. The rule is fairly incomprehensible if I write it out formally, but it's says something like: "If A has genus B, and differentia 'part_of C', then anything that has an is_a relationship to B and a part_of relationship

to C has an implied is_a relationship to A." Of course, the general rule is not specific to the part_of relationship; it works for any differentia. The cross product rule goes through the whole ontology, parsing the cross product definitions, looking for matching terms, and generating new is_a relationships.

Part IA - The Reasoner and the Verification System

The reasoner and the new verification system have a very special relationship. Some checks in the verification system can use the reasoner to find problems in the ontology (the reasoner itself doesn't identify any problems, it just generates new links). There are 2 kinds of problems that the verification system can check right now:

  1. Disjointedness - You know the "disjoint_from" relationship that shows up in every ontology? That relationship can be used to indicate that two classes are "disjoint". If two classes are disjoint, they cannot have ANY subclasses in common. The disjointedness check reports an error of two disjoint classes share any subclasses in common.

    This could be very useful for the GO. For example, we could mark the ontology roots as disjoint from each other. If any term turned out to have an is_a relationship to more than one root, it would be very bad, and the disjointedness check would give us a stern warning.

    (Note: disjoint_with is a symmetric relationship. If A -disjoint_with-> B, it's obviously also true that B -disjoint_with-> A. Keep this in mind as we discuss symmetric relationships later).

  2. Cyclicity - When you're editing a relationship type, you may have noticed the "is_cyclic" checkbox. If this checkbox is clicked, it means that it is legal to create a cycle out of this relationship type. The cyclicity check makes sure that no cycles have been created out of non-cyclic relationship types.

    Note that this isn't a trivial check. It's possible to create an implied cycle without creating an explicit cycle. For example:

    A -part_of->B
    B -part_of -> C
    C -is_a-> A

    This doesn't look like a cycle on trivial inspection, but once the reasoner uses the transitivity rule to add an implied link:

    C -part_of-> B (because C -is_a-> A and A -part_of-> B)

    it becomes clear that we have a cycle:
    B -part_of-> C
    C -part_of-> B

    With lots of transitive relationships, it can become almost impossible to figure these out by visual inspection alone. The reasoner & the verifier come to our rescue.

    The cyclicity check also can help debug cross products, because a lot of common cross-product errors manifest themselves as implied is_a cycles.

    Finally, note that not all cycles are bad. Symmetric relationships, for example, are always cyclic.

Part II - What I Want It to Do

The reasoner could be much more powerful with a few improvements. The first few things are just new reasoner rules:

  1. Symmetry - Right now it's possible to mark a relationship type as symmetrical, but OBO-Edit doesn't do anything with that information. A symmetric relationship is one where the inverse relationship is automatically implied. If p is a symmetric relationship type, and A -p-> B, it implies that B -p-> A. OBO-Edit's reasoner should be extended so that it can create these implied relationships.
  2. Domain & Range - Right now it's possible to provide a domain and range for a relationship type, but OBO-Edit ignores that information. The domain and range rule would make OBO-Edit use that info.

    Domain and range are notoriously tricky to explain, because there's the common understanding of domain and range, and then there's the real definition, which is quite different. Here's the colloquial definition of domain and range:

    Domain restricts the kind of children that a relationship type can have, and range restricts the kind of parent a relationship type can have. So let's say we're working on an anatomy ontology, and we have the relationship type "exists_during_stage". The domain of "exists_during_stage" is anatomical_entity (the root of our anatomy ontology), and the range of "exists_during_stage" is developmental_stage (the root of our developmental stage ontology). That means that if we create a relationship with the type "exists_during_stage", the child of that relationship has to be some term from the anatomy ontology, and the parent has to be some developmental stage term. The domain and range restrictions prevent us from saying something nonsensical, like "heart -exists_during_stage-> lung".

    If you're feeling overwhelmed, just assume that the above is the real meaning of domain and range, appreciate why we'd want this, and skip down to the next item. If you're feeling fiesty, continue...

    The colloquial definition is not actually how domain and range work in a description logic language (like OWL or OBO). In a DL language, if relationship type p has domain D, and A -p-> X, it implies that A -is_a-> D. To use our "exists_during_stage" example above, it means that if heart -exists_during_stage-> adult, it automatically implies that heart -is_a-> anatomical_entity (which is true, so we're in good shape).

    Similarly, if relationship type p has range R and X -p-> A, it implies that A -is_a-> R. It's like the definition of domain, but it applies to parents rather than children.

    Note that if we have the right disjoint_with relationships defined, the real definition of domain and range are equivalent to the colloquial definition. In our example above, let's say we marked the anatomy and developmental stage ontologies as disjoint. Then let's say we accidentally added the bad "heart -exists_during_stage-> lung" relationship. The reasoner would complain, because:

    1. Since "exists_during_stage" has the range "developmental_stage", the bad relationship implies lung -is_a-> developmental_stage
    2. Now lung -is_a-> anatomical_entity AND lung -is_a-> developmental_stage. Since developmental_stage and anatomical_entity are disjoint, the disjointedness check will spot the error.

      This'll be a little weird, because OBO-Edit will report a disjointedness violation. But we can make things clearer by adding another check to the verification plugin that checks relationships against the colloquial definitions of domain and range, and reports a warning if it finds any problems.

  3. necessity - In OBO-Edit, all new relationships are automatically marked "necessary", but it is possible to designate a relationship as "not necessary" (you can unclick the checkbox in the Parent Plugin, for example). But OBO-Edit doesn't use this information; it treats all relationships as necessary.

    Necessary relationships are always true. But sometimes you might want to note a relationship that is almost always true, just for your users' information. In that case, you would mark the relationship as "not necessarily true".

    When the reasoner sees a not necessarily true relationship, it should either ignore it, or mark any implied relationships as being not necessarily true either.

  4. inverse necessity and inverse_of - Most of the time, the inverse of a relationship is not necessarily true. Consider rose -part_of-> rosebush. If you have a rose, it's certainly part of a rosebush, but the inverse is not necessarily true. If you have a rosebush, you don't necessarily have a rose. The roses may not have budded yet, because it's winter or something.

    But sometimes the inverse IS necessarily true. I may be mistaken, but I think every eukaryotic cell has mitochondria, and mitochondria never occur outside of eukaryotic cells (if I'm wrong, just go with me). That means that the inverse of "mitochondria -part_of-> eukaryotic cell" is necessarily true.

    You can mark relationships as being "inverse necessarily true" in OBO-Edit, but the information is ignored. The OBO-Edit reasoner should be able to use that information to explicitly create the inverse relationship. The reasoner should be able to create "cell has_part mitochondria" from "mitochondria part_of cell".

    To do this, the reasoner needs to use the inverse_of relationship as well, in order to figure out that has_part is the inverse of part_of.

  5. Transitive Over - Right now, transitive relationships only propegate over is_a and the transitive relationship itself. But some transitive relationships may propegate over other relationship types. For example, if John -drinks_with-> Janelle, and Janelle -is_dating-> Brian, John -drinks_with-> Brian (let's say). We could make a general rule that says the drinks_with relationship propegates over the is_dating relationship. OBO 1.3 format will support this kind of relationship, and the OBO-Edit reasoner needs to support it too.
  6. Incremental reasoning - Right now, every time you do an edit, OBO-Edit has to run the reasoner all over again, from scratch. This takes forever.

    With incremental reasoning, the reasoner could incorporate new facts (or remove old, defunct facts) without having to re-execute the whole process from scratch. Adding a new term with the reasoner activated would take < 1 second, as opposed to 20 - 120 seconds as it is now.

Part III - Longer Term Plans

In the long run, the OBO-Edit reasoner framework will allow pluggable "reasoner adapters". These adapters will allow OBO-Edit to call third party reasoners (like Racer, etc) via the standard reasoner interfaces.