Working with XPs: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
[[Category:Cross Products]]
= Notes on Working with XPs in OE- illustrated using the Drosophila anatomy ontology =
= Notes on Working with XPs in OE- illustrated using the Drosophila anatomy ontology =


Line 67: Line 68:
4. Don't be fooled by the redundancy flags that show up everywhere when using the "link pile reasoner", they're just wrong (See [https://sourceforge.net/tracker/index.php?func=detail&aid=2263348&group_id=36855&atid=418257 this ticket].  Unfortunately that means you can't filter out redundant relations - or tell what relations are really redundant without investigating each one carefully.
4. Don't be fooled by the redundancy flags that show up everywhere when using the "link pile reasoner", they're just wrong (See [https://sourceforge.net/tracker/index.php?func=detail&aid=2263348&group_id=36855&atid=418257 this ticket].  Unfortunately that means you can't filter out redundant relations - or tell what relations are really redundant without investigating each one carefully.


5. If you are sure that two siblings are disjoint (ask yourself if any instance could ever be both), then it is worth adding a disjoint_from 'relation'.  With these in place, the reasoner can flag inconsitencies in your ontology. Note, you only have to make the relationship in one direction - it doesn't matter which.  This can get confusing when many siblings are involved.  Better tools for this would be good.  There is a further [https://sourceforge.net/tracker/index.php?func=detail&aid=2412063&group_id=36855&atid=418257 problem with displaying disjoint root terms in the OTE].
5. If you are sure that two siblings are disjoint (ask yourself if any instance could ever be both), then it is worth adding a disjoint_from 'relation'.  With these in place, the reasoner can flag inconsitencies in your ontology. Note, you only have to make the relationship in one direction - it doesn't matter which.  This can get confusing when many siblings are involved.  Better tools for this would be good.  Related bug: [https://sourceforge.net/tracker/index.php?func=detail&aid=2412063&group_id=36855&atid=418257 The way the OTE displays disjoint root terms currently makes it impractical to assert disjointness between roots.]


6. Incremental reasoning only partly works and tends to get confused.  It is therefore important to re-reason periodically. However, this gets slower each time during a session where many XP terms are made, so it is also necessary to re-start OE every so often [https://sourceforge.net/tracker/index.php?func=detail&aid=2225310&group_id=36855&atid=418257 see this ticket]).  Particularly frustrating is a tendency to flag disjoint violations where there are none, as this makes saving difficult.
6. Incremental reasoning only partly works and tends to get confused.  It is therefore important to re-reason periodically. However, this gets slower each time during a session where many XP terms are made, so it is also necessary to re-start OE every so often [https://sourceforge.net/tracker/index.php?func=detail&aid=2225310&group_id=36855&atid=418257 see this ticket]).  Particularly frustrating is a tendency to flag disjoint violations where there are none, as this makes saving difficult.
Line 148: Line 149:
This is the strategy I have used to classify all eo-neurons as sensory neurons.  It is extremely efficient: We currently have 146 eo-neuron classes.  Furhter, there are 429 eo-type sensillar subclasses in the ontology and all of these have at least one neuron as a part. As these missing neuronal classes are named and added to the ontology, this approach ensures they will automatically be classified as eo neurons and sensory neurons.
This is the strategy I have used to classify all eo-neurons as sensory neurons.  It is extremely efficient: We currently have 146 eo-neuron classes.  Furhter, there are 429 eo-type sensillar subclasses in the ontology and all of these have at least one neuron as a part. As these missing neuronal classes are named and added to the ontology, this approach ensures they will automatically be classified as eo neurons and sensory neurons.


However, given the dangers of this, I think that it should be used sparingly and well documented.  A filter for these classes would be useful.  It could be used within the verification manager to provide warnings.
However, given the dangers of this, I think that it should be used sparingly and well documented.  A filter for these classes would be useful.  It could be used within the verification manager to provide warnings. [https://sourceforge.net/tracker/index.php?func=detail&aid=2338989&group_id=36855&atid=418260 Feature request here.]

Latest revision as of 18:18, 16 July 2014

Notes on Working with XPs in OE- illustrated using the Drosophila anatomy ontology

This page assumes a basic understanding of how to make and use necessary and sufficient definitions (AKA XP, genus and differentia or intersection definitions.) For more information on these please see: Logical_Definitions.

Background

Until recently, most terms in the Drosophila anatomy ontology had no is_a parent, but some had many i.e.- the ontology contained lots of very incomplete multiple inheritance, making it hard to maintain and poor at grouping annotations. In the current version, in some systems at least, multiple asserted inheritance has been reduced - replaced by inheritance inferred from XP definitions using a reasoner. For many differentia (e.g.- sensory function) this task requires using foreign terms in both regular relationships and intersections to define Drosophila anatomy terms. This page outlines how I've used OE2 to change the ontology in this way and the problems I've encountered while doing so (with links to OE tickets).

Overall Strategy

In most cases, terms are defined using either intersections or regular relationships (although see below for a case for combining them) - no attempt is made to keep parallel, redundant regular relationships for terms with intersections. This means that, once many XP definitions are in place, the ontology is quite flat and difficult to work with without a reasoner on. Foreign terms in this ontology are selectively imported using filters that save is_a parents and children of key terms. They remain in the ontology indefinitely - but will need to be periodically updated. Right now, the only way to keep track of versions used for importation is via comments saved in the header.

Rather than use repair mode - where each inferred is_a is assessed prior to assertion - two versions are released. In the edited copy no links are asserted. A pre-reasoned version is produced from the edited copy in which all inferred links are asserted. The aim here is to produce a version that can be used in the same way as versions of the ontology prior to the introduction of XP definitions. Most importantly, it needs to be usable for grouping annotations using is_a and part_of children of any given term. This version also lacks relationships to foreign terms, as I suspect these will cause problems for some end users.

  • To make this second version:
    • all implied relationships are instantiated (non-redundantly);
    • XP defs are auto-converted to textual definitions;
    • Some XP differentia are converted to regular relationships (in order that information is not lost - see this ticket);
    • XP genus lines are stripped;
    • Foreign terms are stripped out;
    • Redundancy is removed;

For more details see Making_a_release_version.

Note - this approach ensures that the resulting ontology can be used with efficiently with a reasoner and so will be usable for searching/querying in OE and Protege4.

Plan of attack

Working from an ontology with multiple inheritance, it is best to attack some convenient subdivision of the ontology at at time (for anatomy, organ systems work well) and to start by making a list of all the major differentia used. From this list, one can devise a strategy for expressing at least some of these differenatia using existing relations or plausible new ones + internal or external terms.

1. differentia using existing relations and existing terms:

name: 'larval head sensillum' 
intersection_of: sensillum
intersection_of: part_of larval head

2. Differentia using existing terms, but requiring new relations

name: anterior fascicle sensory neuron
intersection_of: sensory neuron
intersection_of: fasciculates_with anterior fascicle

3. Differentia using foreign terms

name: sensory neuron 
intersection_of: neuron
intersection_of: has_function GO:0050906 ! detection of stimulus involved in sensory perception

(Note the example in 2 would be better with 2 differentia, one being intersection_of: has_function GO:0050906 ! detection of stimulus involved in sensory perception. It can then be autoclassified as a sensory neuron.)

In every case, many regular relationships will also need to be made in order to completely classify - including to foreign terms. In the case of partonomy, this often means moving many terms from an asserted is_a parent to a suitable part_of parent.

Practicalities

1. Useful term renders to use while doing this:

  • "terms that have is_intersection"
  • "terms that don't have is_isa complete"
  • "terms that have isa_parent count >=2"
  • "terms that don't have id contains <ID prefix for terms internal to this ontology>"

2. New terms with XP defs: Add a new root term. Then add the XP def to the new term using the XP editor tab in the text editor.

3. Working with existing terms in the tree: Add XP defs by hand using XP editor tab in text editor. Keep the parent editor visible and delete relationships as necessary. A more straightforward way would be to convert all regular relationships to intersections (the option is there but doesn't work) - or to be able to choose individual elements (also doesn't work?)

4. Don't be fooled by the redundancy flags that show up everywhere when using the "link pile reasoner", they're just wrong (See this ticket. Unfortunately that means you can't filter out redundant relations - or tell what relations are really redundant without investigating each one carefully.

5. If you are sure that two siblings are disjoint (ask yourself if any instance could ever be both), then it is worth adding a disjoint_from 'relation'. With these in place, the reasoner can flag inconsitencies in your ontology. Note, you only have to make the relationship in one direction - it doesn't matter which. This can get confusing when many siblings are involved. Better tools for this would be good. Related bug: The way the OTE displays disjoint root terms currently makes it impractical to assert disjointness between roots.

6. Incremental reasoning only partly works and tends to get confused. It is therefore important to re-reason periodically. However, this gets slower each time during a session where many XP terms are made, so it is also necessary to re-start OE every so often see this ticket). Particularly frustrating is a tendency to flag disjoint violations where there are none, as this makes saving difficult.

Using foreign terms:

Importing a whole foreign ontology is likely to be impractical. Certainly it's unlikely the reasoner will function when multiple large ontologies are loaded. Instead, one can import terms as needed. Often, one may want a whole set of related terms - for example, those for various sensory functions. In such cases, there are often key terms which can be specified in a filtered save along with all parent and child classes and the is_a relations between them:

SAVE FILTER:
term has name equal to X
OR
term has name equal to X in descendant that can be reach via is_a *
OR
term has name equal to X in ancestor that can be reach via is_a

(*Note - the descendant clause here does not currently work in filtered saves.

Balancing efficiency with the dangers of hidden inheritance:

Multiple asserted inheritance is hard to manage - if it is not kept under control, one can all too easily get lost in the resulting tangle, producing an ontology riddled with TPV and with many inheritance groups incomplete.

XPs provide a way to reduce multiple asserted inheritance. They also have the advantage of pushing assertions down to a much lower level of granularity. The biology literature is full of generalizing assertions about classes based on quite limited knowledge of the properties of subclasses. Capturing such assertions directly in high level class terms can be dangerous. As details are added, conflicts with these generalization begin to show up. Using XPs forces us to attach properties to individual classes. These can be recorded along with references providing evidence for the assertion. Classification on the basis of these assertions is safer than working from very general statements.

Generalizing assertions may be buried in the text of a definition, but may also be made using regular relationships. Now, this has the advantage of being efficient. I can record that everything of class X is part_of the head, or has a function in sensory perception and this will be inherited by all subclasses. This is less work than recording that the property for all the individual subclasses (assuming you can keep track of the asserted is_a relations required.)

It is possible, although rather dangerous, to combine the two strategies. A class having an XP definition can also have regular relationships. These record properties that will be inherited by all autoclassified subclasses. The danger of this strategy is obvious - simply by recording properties which fulfill the conditions of the XP class, one is asserting that the other properties of that class apply.

However, this strategy can be useful.

Here's an example

I can record that all eo-neurons are part_of some eo-type sensillum:

eo-type sensillum
	part_of eo-neuron
	.	is_a prothoracic desB neuron
	.	is_a prothoracic desA neuron

From this, a reasoner can conclude that desB is part_of some eo-type sensillum.

or I can use an XP def:

eo neuron
intersection_of: neuron
intersection_of: part_of eo-type sensillum
name: prothoracic desB neuron
is_a: neuron
relationship: part_of prothoracic dorsal sensillum trichodeum dh1
name: prothoracic dorsal sensillum trichodeum dh1
is_a: eo-type sensillum

=> implied classification

prothoracic desB neuron is_a eo-neuron

If I can expand the defintion of eo-neuron with regular relationships:

name: eo neuron
intersection_of: neuron
intersection_of: part_of eo-type sensillum
relationship: develops_from FBbt:00006022 ! external sensory organ precursor cell IIIb
relationship: has_function GO:0050906 ! detection of stimulus involved in sensory perception
name: sensory neuron
intersection_of: neuron
intersection_of: has_function GO:0050906 ! detection of stimulus involved in sensory perception 

Now the reasoner can imply:

prothoracic desB neuron develops_from external sensory organ precursor cell IIIb AND prothoracic desB neuron is_a sensory neuron

(In fact this will be represented in a tree as

sensory neuron
. is_a eo neuron
. . is_a prothoracic desB neuron

The other advantage of computing the tree is that it avoids the addition of redundant terms.)

This is the strategy I have used to classify all eo-neurons as sensory neurons. It is extremely efficient: We currently have 146 eo-neuron classes. Furhter, there are 429 eo-type sensillar subclasses in the ontology and all of these have at least one neuron as a part. As these missing neuronal classes are named and added to the ontology, this approach ensures they will automatically be classified as eo neurons and sensory neurons.

However, given the dangers of this, I think that it should be used sparingly and well documented. A filter for these classes would be useful. It could be used within the verification manager to provide warnings. Feature request here.