Working with XPs: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 63: Line 63:
3. Working with existing terms in the tree:
3. Working with existing terms in the tree:
Add XP defs by hand using XP editor tab in text editor. Keep the parent editor visible and delete relationships as necessary.
Add XP defs by hand using XP editor tab in text editor. Keep the parent editor visible and delete relationships as necessary.
A more straightforward way would be to convert all regular relationships to intersections (the option is there but [ [https://sourceforge.net/tracker/index.php?func=detail&aid=2221607&group_id=36855&atid=418257
A more straightforward way would be to convert all regular relationships to intersections (the option is there but [https://sourceforge.net/tracker/index.php?func=detail&aid=2221607&group_id=36855&atid=418257
  doesn't work] - or to be able to choose individual elements (also doesn't work?)
  doesn't work] - or to be able to choose individual elements (also doesn't work?)


4. Don't be fooled by the redundancy flags that show up everywhere when using the "link pile reasoner", they're just wrong (See [https://sourceforge.net/tracker/index.php?func=detail&aid=2263348&group_id=36855&atid=418257 this ticket].  Unfortunately that means you can't filter out redundant relations - or tell what relations are really redundant without investigating each one carefully.
4. Don't be fooled by the redundancy flags that show up everywhere when using the "link pile reasoner", they're just wrong (See [https://sourceforge.net/tracker/index.php?func=detail&aid=2263348&group_id=36855&atid=418257 this ticket].  Unfortunately that means you can't filter out redundant relations - or tell what relations are really redundant without investigating each one carefully.


5. If you are sure that two siblings are disjoint (ask yourself if any instance could ever be both), then it is worth adding a disjoint_from 'relation'.  With these in place, the reasoner can flag inconsitencies in your ontoloyg. Note, you only have to make the relationship in one direction - it doesn't matter which.  This can get confusing when many siblings are involved.  Better tools for this would be good.  There is a further problem with displaying disjoint root terms in the OTE (see ticket ***).
5. If you are sure that two siblings are disjoint (ask yourself if any instance could ever be both), then it is worth adding a disjoint_from 'relation'.  With these in place, the reasoner can flag inconsitencies in your ontology. Note, you only have to make the relationship in one direction - it doesn't matter which.  This can get confusing when many siblings are involved.  Better tools for this would be good.  There is a further problem with displaying disjoint root terms in the OTE (see ticket ***).


6. Incremental reasoning only partly works and tends to get confused.  It is therefore important to re-reason periodically. However, this gets slower each time during a session where many XP terms are made, so it is also necessary to re-start OE every so often [https://sourceforge.net/tracker/index.php?func=detail&aid=2225310&group_id=36855&atid=418257 see this ticket]).  Particularly frustrating is a tendency to flag disjoint violations where there are none, as this makes saving difficult.
6. Incremental reasoning only partly works and tends to get confused.  It is therefore important to re-reason periodically. However, this gets slower each time during a session where many XP terms are made, so it is also necessary to re-start OE every so often [https://sourceforge.net/tracker/index.php?func=detail&aid=2225310&group_id=36855&atid=418257 see this ticket]).  Particularly frustrating is a tendency to flag disjoint violations where there are none, as this makes saving difficult.
Line 89: Line 89:
Multiple asserted inheritance is hard to manage - if it is not kept under control, one can all too easily get lost in the resulting tangle, producing an ontology riddled with TPV and with many inheritance groups incomplete.
Multiple asserted inheritance is hard to manage - if it is not kept under control, one can all too easily get lost in the resulting tangle, producing an ontology riddled with TPV and with many inheritance groups incomplete.


XPs provide a way to reduce multiple asserted inheritance.  They also have the advantage of pushing assertions down to a much lower level of granularity.  The biology literature is full of generalising assertions about classes based on quite limited knowledge of the properties of subclasses.  Capturing such assertions directly in high level class terms can be dangerous. As details are added, conflicts with these generalisation begin to show up.  Using XPs forces us to attach properties to individual classes.  These can be recorded along with references providing evidence for the assertion.  Classification on the basis of these assertions is safer than working from very general statements.
XPs provide a way to reduce multiple asserted inheritance.  They also have the advantage of pushing assertions down to a much lower level of granularity.  The biology literature is full of generalizing assertions about classes based on quite limited knowledge of the properties of subclasses.  Capturing such assertions directly in high level class terms can be dangerous. As details are added, conflicts with these generalization begin to show up.  Using XPs forces us to attach properties to individual classes.  These can be recorded along with references providing evidence for the assertion.  Classification on the basis of these assertions is safer than working from very general statements.


In the old system - generalising assertions may be buried in the text of a definition, but may also be made using regular relationships.  Now, this has the advantage of being efficient.  I can record that everything of class X is part_of the head, or has a function in sensory perception and this will be inherited by all subclasses.  This is less work than recording that the property for all the individual subclasses (assuming you can keep track of the asserted is_a relations required.
In the old system - generalizing assertions may be buried in the text of a definition, but may also be made using regular relationships.  Now, this has the advantage of being efficient.  I can record that everything of class X is part_of the head, or has a function in sensory perception and this will be inherited by all subclasses.  This is less work than recording that the property for all the individual subclasses (assuming you can keep track of the asserted is_a relations required.


It is possible, although rather dangerous, to combine the two strategies.  A class having an XP definition can also have regular relationships.  These record properties that will be inherited by all autoclassified subclasses.  The danger of this strategy is obvious - simply by recording properties which fulfill the conditions of the XP class, one is asserting that the other properties of that class apply.
It is possible, although rather dangerous, to combine the two strategies.  A class having an XP definition can also have regular relationships.  These record properties that will be inherited by all autoclassified subclasses.  The danger of this strategy is obvious - simply by recording properties which fulfill the conditions of the XP class, one is asserting that the other properties of that class apply.

Revision as of 10:22, 8 December 2008

Notes on Working with XPs in OE- illustrated using the Drosophila anatomy ontology

This page assumes a basic understanding of how to make and use necessary and sufficient definitions (AKA XP, genus and differentia or intersection definitions.) Fore more information on these please see: Logical_Definitions.

AIMS

Until recently, most terms in the Drosophila anatomy ontology had no is_a parent, but some had many i.e.- the ontology contained lots of very incomplete multiple inheritance, making it hard to maintain and poor at grouping annotations. In the current version, in some systems at least, multiple asserted inheritance has been reduced - replaced by inheritance inferred from XP defintions using a reaosner. For many differentia (e.g.- sensory function) this task requires using foreign terms in both regular relationships and intersections to define Drosophila anatomy terms. Note, this is not a dogmatic insistence on single asserted inheritance - it is simply an effort to reduce multiple inheritance wherever there is a clear strategy for expressing differentia formally. Formalising some axes of classication is extremely challenging and will certainly require expressivity we don't yet have in OBO (at least not via the current OE2).

Overall Strategy

In most cases, terms are defined using either intersections or regular relationships (although see below for a case for combining them) - no attempt is made to keep parallel, redundant regular relationships for terms with intersections. This means that, once many XP definitions are in place, the ontology is quite flat and difficult to work with without a reasoner on. Foreign terms in this ontology are selectively imported using filters that save is_a parents and children of key terms. They remain in the ontology indefinitely - but will need to be periodically updated. Right now, the only way to keep track of versions used for importation is via comments saved in the header.

Prior to release, a second version is produced. The aim is to produce a version that can be used in the same way as versions of the ontology prior to the introduction of XP definitions. Most importantly, it needs to be usable for grouping annotations using is_a and part_of children of any given term. This version also lacks relationships to foriegn terms, as I suspect these will cause problems for some end users.

  • To make this second version:
    • all implied relationships are instantiated (non-redundantly);
    • XP defs are auto-converted to textual definitions;
    • Some XP differentia are converted to regular relationships (in order that information is not lost - see this ticket);
    • XP genus lines are stripped;
    • Foreign terms are stripped out;
    • Redundancy is removed;

For more details see Making_a_release_version.

Note - this approach ensures that the resulting ontology can be used with efficiently with a reasoner and so will be usable for searching/querying in OE and Protege4.

Plan of attack

Working from an ontology with multiple inheritance, it is best to attack some convenient subdivision of the ontology at at time (for anatomy, organ systems work well) and to start by making a list of all the major differentia used. From this list, one can devise a strategy for expressing at least some of these differenatia using existing relations or plausible new ones + internal or external terms.

1. differentia using existing relations and existing terms:

name: 'larval head sensillum' 
intersection_of: sensillum
intersection_of: part_of larval head

2. Differentia using existing terms, but requiring new relations

name: anterior fascicle sensory neuron
intersection_of: sensory neuron
intersection_of: fasciculates_with anterior fascicle

3. Differentia using foreign terms

name: sensory neuron 
intersection_of: neuron
intersection_of: has_function GO:0050906 ! detection of stimulus involved in sensory perception

(Note the example in 2 would be better with 2 differentia, one being intersection_of: has_function GO:0050906 ! detection of stimulus involved in sensory perception. It can then be autoclassified as a sensory neuron.)

In every case, many regular relationships will also need to be made in order to completely classify - including to foreign terms. In the case of partonomy, this often means moving many terms from an asserted is_a parent to a suitable part_of parent.

Practicalities

1. Useful term renders to use while doing this:

  • "terms that have is_intersection"
  • "terms that don't have is_isa complete"
  • "terms that have isa_parent count >=2"
  • "terms that don't have id contains <ID prefix for terms internal to this ontology>"

2. New terms with XP defs: Add a new root term. Then add the XP def to the new term using the XP editor tab in the text editor.

3. Working with existing terms in the tree: Add XP defs by hand using XP editor tab in text editor. Keep the parent editor visible and delete relationships as necessary. A more straightforward way would be to convert all regular relationships to intersections (the option is there but [https://sourceforge.net/tracker/index.php?func=detail&aid=2221607&group_id=36855&atid=418257

doesn't work] - or to be able to choose individual elements (also doesn't work?)

4. Don't be fooled by the redundancy flags that show up everywhere when using the "link pile reasoner", they're just wrong (See this ticket. Unfortunately that means you can't filter out redundant relations - or tell what relations are really redundant without investigating each one carefully.

5. If you are sure that two siblings are disjoint (ask yourself if any instance could ever be both), then it is worth adding a disjoint_from 'relation'. With these in place, the reasoner can flag inconsitencies in your ontology. Note, you only have to make the relationship in one direction - it doesn't matter which. This can get confusing when many siblings are involved. Better tools for this would be good. There is a further problem with displaying disjoint root terms in the OTE (see ticket ***).

6. Incremental reasoning only partly works and tends to get confused. It is therefore important to re-reason periodically. However, this gets slower each time during a session where many XP terms are made, so it is also necessary to re-start OE every so often see this ticket). Particularly frustrating is a tendency to flag disjoint violations where there are none, as this makes saving difficult.

Using foreign terms:

Importing a whole foreign ontology is likely to be impractical. Certainly it's unlikely the reasoner will function when multiple large ontologies are loaded. Instead, one can import terms as needed. Often, one may want a whole set of related terms - for example, those for various sensory functions. In such cases, there are often key terms which can be specified in a filtered save along with all parent and child classes and the is_a relations between them:

SAVE FILTER:
term has name equal to X
OR
term has name equal to X in descendant that can be reach via is_a *
OR
term has name equal to X in ancestor that can be reach via is_a

(*Note - the descendant clause here does not currently work in filtered saves.

Efficiency vs the dangers of hidden inheritance:

Multiple asserted inheritance is hard to manage - if it is not kept under control, one can all too easily get lost in the resulting tangle, producing an ontology riddled with TPV and with many inheritance groups incomplete.

XPs provide a way to reduce multiple asserted inheritance. They also have the advantage of pushing assertions down to a much lower level of granularity. The biology literature is full of generalizing assertions about classes based on quite limited knowledge of the properties of subclasses. Capturing such assertions directly in high level class terms can be dangerous. As details are added, conflicts with these generalization begin to show up. Using XPs forces us to attach properties to individual classes. These can be recorded along with references providing evidence for the assertion. Classification on the basis of these assertions is safer than working from very general statements.

In the old system - generalizing assertions may be buried in the text of a definition, but may also be made using regular relationships. Now, this has the advantage of being efficient. I can record that everything of class X is part_of the head, or has a function in sensory perception and this will be inherited by all subclasses. This is less work than recording that the property for all the individual subclasses (assuming you can keep track of the asserted is_a relations required.

It is possible, although rather dangerous, to combine the two strategies. A class having an XP definition can also have regular relationships. These record properties that will be inherited by all autoclassified subclasses. The danger of this strategy is obvious - simply by recording properties which fulfill the conditions of the XP class, one is asserting that the other properties of that class apply.

However, this strategy can be useful.

Here's an example

I can record that all eo-neurons are part_of some eo-type sensillum:

eo-type sensillum
	part_of eo-neuron
	.	is_a prothoracic desB neuron
	.	is_a prothoracic desA neuron

From this, a reasoner can conclude that desB is part_of some eo-type sensillum.

or I can use an XP def:

eo neuron
intersection_of: neuron
intersection_of: part_of eo-type sensillum
name: prothoracic desB neuron
is_a: neuron
relationship: part_of prothoracic dorsal sensillum trichodeum dh1
name: prothoracic dorsal sensillum trichodeum dh1
is_a: eo-type sensillum

=> implied classification

prothoracic desB neuron is_a eo-neuron

If I can expand the defintion of eo-neuron with regular relationships:

name: eo neuron
intersection_of: neuron
intersection_of: part_of eo-type sensillum
relationship: develops_from FBbt:00006022 ! external sensory organ precursor cell IIIb
relationship: has_function GO:0050906 ! detection of stimulus involved in sensory perception
name: sensory neuron
intersection_of: neuron
intersection_of: has_function GO:0050906 ! detection of stimulus involved in sensory perception 

Now the reasoner can imply:

prothoracic desB neuron develops_from external sensory organ precursor cell IIIb AND prothoracic desB neuron is_a sensory neuron

(In fact this will be represented in a tree as

sensory neuron
. is_a eo neuronBR
. . is_a prothoracic desB neuron

The other advantage of computing the tree is that it avoids the addition of redundant terms.)

This is the strategy I have used to classify all eo-neurons as sensory neurons. It is extremely efficient: We currently have 146 eo-neuron classes. Furhter, there are 429 eo-type sensillar subclasses in the ontology and all of these have at least one neuron as a part. As these missing neuronal classes are named and added to the ontology, this approach ensures they will automatically be classified as eo neurons and sensory neurons.

However, given the dangers of this, I think that it should be used sparingly and well documented. A filter for these classes would be useful. It could be used within the verification manager to provide warnings.