https://wiki.geneontology.org/index.php?title=SO:Composite_Terms&feed=atom&action=historySO:Composite Terms - Revision history2024-03-28T12:23:14ZRevision history for this page on the wikiMediaWiki 1.40.0https://wiki.geneontology.org/index.php?title=SO:Composite_Terms&diff=53656&oldid=prevGail at 18:17, 14 July 20142014-07-14T18:17:30Z<p></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 14:17, 14 July 2014</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div> </div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">[[Category:Cross Products]]</ins></div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>SO contains cross-product definitions (aka genus-differentia</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>SO contains cross-product definitions (aka genus-differentia</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>definitions, aka intersection definitions) for many composite</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>definitions, aka intersection definitions) for many composite</div></td></tr>
</table>Gailhttps://wiki.geneontology.org/index.php?title=SO:Composite_Terms&diff=17165&oldid=prevCjm: New page: SO contains cross-product definitions (aka genus-differentia definitions, aka intersection definitions) for many composite terms. This document describes the methodology. Some familiarity...2008-12-14T01:16:15Z<p>New page: SO contains cross-product definitions (aka genus-differentia definitions, aka intersection definitions) for many composite terms. This document describes the methodology. Some familiarity...</p>
<p><b>New page</b></p><div><br />
SO contains cross-product definitions (aka genus-differentia<br />
definitions, aka intersection definitions) for many composite<br />
terms. This document describes the methodology. Some familiarity with<br />
the obo file format is assumed.<br />
<br />
This document is aimed primarily at ontology editors and<br />
technical/software/database people who consume the ontologies. It<br />
isn't intended for the end-users of ontologies, much of this will be<br />
invisible to them.<br />
<br />
=Pre-crossproducts=<br />
<br />
Here is an example of a term done using the pre- crossproduct<br />
methodology:<br />
<br />
[Term]<br />
id: SO:0000283<br />
name: engineered_foreign_transposable_element_gene<br />
is_a: SO:0000111 ! transposable_element_gene<br />
is_a: SO:0000281 ! engineered_foreign_gene<br />
is_a: SO:0000805 ! engineered_foreign_region<br />
<br />
This is problematic. We multiple is_a parents, due to a lack of<br />
consistent axis of classification. This leads to tangled DAGs and<br />
problems of ontology maintenance, visualisation and reasoning. <br />
<br />
Note the editor has to manually check for possible other is_a parents<br />
such as "engineered_transposable_elemenent_gene" (ETEG). Furthermore,<br />
if ETEG is added, the is_a parentage of EFTEG must be changed. This is<br />
tedious, time consuming and error-prone. <br />
<br />
The problems continue further up the DAG:<br />
<br />
[Term]<br />
id: SO:0000281<br />
name: engineered_foreign_gene<br />
is_a: SO:0000280 ! engineered_gene<br />
is_a: SO:0000285 ! foreign_gene<br />
is_a: SO:0000804 ! engineered_region<br />
<br />
If we were to examine the whole DAG we would see a lot of redundancy,<br />
and no modularisation<br />
<br />
Here is an example (showing *is_a* only):<br />
<br />
[[Image:Efteg.png]]<br />
<br />
=The cross-products solution=<br />
<br />
The first aspect of the solution is '''modularity'''. We realise the<br />
separation between the core feature types (such as gene, region) and<br />
between the qualities (properties, attributes) of those<br />
features. Examples of feature qualities are "being engineered" and<br />
"being foreign". These live in a separate part of the ontology, and<br />
trace their is_a parentage solely to "feature_attribute", not to<br />
"located_sequence_feature".<br />
<br />
We also introduce a new relation "has_quality", which obtains between<br />
some kind of quality-bearing entity (such as a gene) and a quality.<br />
<br />
Using these ingredients we can provide 'Genus-differentia' definitions<br />
of terms in a form that is computationally visible. In a definition of<br />
this form, a term is defined using a broader category (the genus), and<br />
a collection characteristics that distinguish from other instances in<br />
the same category (the differentia).<br />
<br />
http://en.wikipedia.org/wiki/Definition_by_genus_and_difference<br />
<br />
Genus-differentia definitions form one of the core best practices in<br />
the OBO Foundry (http://www.obofoundry.org). These definitions can be<br />
written as "A <G> 'which' <D>". For example, we can define an<br />
engineered foreign transposable element gene as "A transposable<br />
element gene *which* is engineered and is foreign". The genus is<br />
"tranposable element gene" and the differentia are "is engineered" and<br />
"is foreign".<br />
<br />
We can also expose these definitions in a way that is computationally<br />
visible. [add picture of editing in oboedit here].<br />
<br />
==obo file representation==<br />
<br />
The underlying representation in oboedit is as follows:<br />
<br />
[Term]<br />
id: SO:0000283<br />
name: engineered_foreign_transposable_element_gene<br />
intersection_of: SO:0000111 ! transposable_element_gene<br />
intersection_of: has_quality SO:0000783 ! engineered<br />
intersection_of: has_quality SO:0000784 ! foreign<br />
<br />
The "intersection_of" lines list the necessary and sufficient<br />
conditions for inclusion in a class (term). For this to be a G-D<br />
definition, there should be one intersection_of line without a<br />
relation (the genus) and at least one line with a relation (the<br />
differentia).<br />
<br />
Of course, most people will not be looking at obo files. Oboedit provides a plugin for editing these genus-differentia definitions (see below for screenshot)<br />
<br />
Using these definitions, a computer can calculate where EFTEG should<br />
be placed in a DAG (provided similar definitions are provided for<br />
other terms). The computer can also calculate that EFTEGs should be<br />
returned in queries for ETEGs or EFRs ('''engineered_foreign_region'''s).<br />
<br />
These caclulations are typically done with a 'reasoner'. oboedit has a reasoner built-in.<br />
<br />
[[Image:so-xp.jpg]]<br />
<br />
The blue squiggly lines are 'is_a's that have been inferred by oboedit using the genus-differentia definitions. They have 'not' been asserted by the person editing the ontology.<br />
<br />
This is all well and good for oboedit users, but not everyone uses uses this tool. Whilst there are many other reasoners available, we should still provide the DAG fully classified so that there are no additional dependencies required by consumers of the ontology.<br />
<br />
We can configure oboedit to save all inferred 'is_a' links (see issues, below). The saved file will have entries like this:<br />
<br />
[Term]<br />
id: SO:0000283<br />
name: engineered_foreign_transposable_element_gene<br />
intersection_of: SO:0000111 ! transposable_element_gene<br />
intersection_of: has_quality SO:0000783 ! engineered<br />
intersection_of: has_quality SO:0000784 ! foreign<br />
is_a: SO:0000111 ! transposable_element_gene<br />
is_a: SO:0000281 ! engineered_foreign_gene<br />
<br />
We call the is_a links above 'asserted', because they are explicitly stated in the file, rather than implicitly inferred by the oboedit reasoner.<br />
<br />
This means that software can ignore the intersection_of lines safely,<br />
the old tangled DAG can still be displayed as normal.<br />
<br />
When the ontology with asserted 'is_a' links is viewed in oboedit, it will look like this:<br />
<br />
[[Image:so-xp-with-is_as.jpg]]<br />
<br />
The red arrows indicate asserted 'is_a' links that could have been inferred had they not been there<br />
<br />
==Obtaining==<br />
<br />
The public version of the ontology contains the logical definitions<br />
<br />
The genus-differentia matrix can be manipulated as an excel file<br />
<br />
[[Media:so-xp.xls]] -- generated 2006/08/25<br />
<br />
==Benefits==<br />
<br />
The management of the tangled is_a DAG is<br />
handled automatically by software, so the ontology editor does not need<br />
to worry about it. Downstream tools should not be affected.<br />
<br />
However, second-generation tools can choose to use the intersection_of<br />
lines; they can be used to present the ontology DAG to the user in a<br />
more tractable, modular fashion. The genus in the definition can be<br />
used as the "core" is_a parent. The differentia could be presented in<br />
a separate display.<br />
<br />
=open issues=<br />
<br />
==saving inferences==<br />
<br />
oboedit does not allow you to save all inferred 'is_a's. Currently<br />
so-xp is saved without the inferred is_a parents which limits its<br />
applicability to first-generation obo tools (ie those without reasoning capabilities).<br />
<br />
Until oboedit can do this, it may be necessary to semi-manually add<br />
the is_as (oboedit shows you these visually but it doesn't provide a<br />
way to materialize them in the resulting saved obo file).<br />
<br />
Another option is to convert to owl and use a third-party open source<br />
reasoner such as pellet to do the classification, then convert back to<br />
obo. This could all be automated in a script. The curator version<br />
(so-xp.obo) would not have the is_as, but the so.obo file that is for<br />
public consumption and use by first-generation tools would have the<br />
is_as materialised.<br />
<br />
UPDATE: we used Pellet to do the initial classification. Results still being checked.<br />
Once John is back we can discuss ways of making it easier to save the oboedit classification results, or using obo2obo to fill these in, but Pellet seemed to work as a one-off<br />
<br />
http://www.mindswap.org/2003/pellet/<br />
<br />
===what happens on changes?===<br />
<br />
One advantage in never asserting the inferrable 'is_a' links is never having to worry about recreating 'is_a links when the core parts of the ontology change.<br />
<br />
For example, if we were to create an intermediate type between "gene" and "region" (for example, "functional region") and also wanted to created terms like "engineered functional region") we would simply go ahead and do that, provide genus-differentia definitions, and let the reasoner compute the is_a DAG on-the-fly.<br />
<br />
However, as we stated earlier, we want to save the obo file with the DAG fully classified, since most tools that consume the obo file will not be reasoner-aware. We can still use oboedit to create the is_a links automatically, and configure it so that these are saved. The problem here is that change in one part of the ontology can percolate to large sections of the DAG - how do we know which links to replace and which to preserve?<br />
<br />
One way is to keep around information on which links were asserted directly by a curator '''not''' as a result of reasoning, and which were originally asserted by the reasoner? For example, we could use trailing qualifiers:<br />
<br />
[Term]<br />
id: SO:0000283<br />
name: engineered_foreign_transposable_element_gene<br />
intersection_of: SO:0000111 ! transposable_element_gene<br />
intersection_of: has_quality SO:0000783 ! engineered<br />
intersection_of: has_quality SO:0000784 ! foreign<br />
is_a: SO:0000111 ! transposable_element_gene {inferred=true}<br />
is_a: SO:0000281 ! engineered_foreign_gene {inferred=true}<br />
<br />
The reasoner would know that these could be discarded if they can no longer be inferred.<br />
<br />
This is still under discussion. For now, these links may have to be removed manually - which is no worse than the pre-reasoner situation when everything was done manually<br />
<br />
==Re-Use==<br />
<br />
Currently SO has its own ontology of feature attributes; eventually we<br />
may want to merge this with PATO [[PATO:Main_Page]]<br />
<br />
So also uses its own has_quality relation. Eventually it should use<br />
the version that will be in RO [[RO:Main_Page]].<br />
<br />
=applicability of methodology to other ontologies=<br />
<br />
This work was carried out as part of a larger project within the Gene Ontology and the http://www.obofoundry.org [OBO-Foundry] to create logical and computable genus-differentia definitions for terms, linking across ontologies where appropriate. See [[XP:Main_Page]]<br />
<br />
We are applying the same methodology to GO, although the xps are not<br />
yet part of the public release. We are focused on xps for GO terms<br />
that refer to CL terms right now.<br />
<br />
=other resources=<br />
<br />
==mail lists==<br />
<br />
https://lists.sourceforge.net/lists/listinfo/obo-crossproducts<br />
<br />
==oboedit guide==<br />
<br />
Link to appropriate section of oboedit guide here...<br />
<br />
==background reading==<br />
<br />
===definitions in the OBO Foundry===<br />
<br />
http://www.obofoundry.org<br />
<br />
Forthcoming paper<br />
<br />
Obol paper; see link on:<br />
http://www.fruitfly.org/~cjm/obol<br />
<br />
===Modularity in ontologies===<br />
<br />
These tutorials are very OWL and Protege centric, but much of it also applies to obo1.2 and oboedit:<br />
<br />
http://www.co-ode.org/resources/tutorials/intro/</div>Cjm