Transitive closure: Difference between revisions
Glenjarvis (talk | contribs) m (Spelling/Typo correction) |
|||
(9 intermediate revisions by one other user not shown) | |||
Line 3: | Line 3: | ||
See [http://www.geneontology.org/GO.database.shtml#schema_notes Schema notes] | See [http://www.geneontology.org/GO.database.shtml#schema_notes Schema notes] | ||
The old way was to ''ignore the relation'' in the GO graph and calculate a '''blind''' transitive closure. This works fine so long as GO consists of only a few relations like [[is_a]] and [[part_of]]. However, it results in false positives when used with [[regulates]]. These are not so serious as we had previously been treating regulates as part_of anyway. However, the relation does need to be taken into account for other relations, | The old way was to ''ignore the relation'' in the GO graph and calculate a '''blind''' transitive closure. This works fine so long as GO consists of only a few relations like [[is_a]] and [[part_of]]. However, it results in false positives when used with [[regulates]]. These are not so serious as we had previously been treating regulates as part_of anyway. However, the relation does need to be taken into account for other relations, especially [[has_part]] | ||
== Calculating the transitive closure: the new way == | == Calculating the transitive closure: the new way == | ||
Line 13: | Line 13: | ||
# A simple tab-delimited file | # A simple tab-delimited file | ||
# in the graph_path table in the database | # in the graph_path table in the database | ||
There are many advantages to the new system | |||
* will not make erroneous calculations for relations such as [[has_part]] | |||
* treats [[regulates]] correctly | |||
* scales with new relations | |||
=== Tab delimited file === | === Tab delimited file === | ||
Line 31: | Line 37: | ||
=== graph_path table === | === graph_path table === | ||
The [http://www.geneontology.org/GO.database.schema.shtml#go-optimisations.table.graph-path graph_path] table | The [http://www.geneontology.org/GO.database.schema.shtml#go-optimisations.table.graph-path graph_path] table has been extended to include the relation: | ||
--- @@ graph_path.relationship_type_id | --- @@ graph_path.relationship_type_id | ||
Line 43: | Line 49: | ||
relationship_type_id integer, | relationship_type_id integer, | ||
foreign key (relationship_type_id) references term(id), | foreign key (relationship_type_id) references term(id), | ||
This brings the graph_path table more in line with cvtermpath in Chado | |||
A new column ''relation_distance'' has been added. The current ''distance'' column will remain, and have the same semantics (i.e. number of hops to get from a node (term2) to its descendant (term1), regardless of relation. The new relation_distance column measures the number of hops over the specified relation only. | |||
From the DDL docs: | |||
--- @@graph_path.distance | |||
--- The distance in terms of the number of "hops" between | |||
--- nodes in the asserted graph (term2term). | |||
--- The relationship_type_id is ignored here. | |||
--- Example: if A part_of B is_a C part_of D, then | |||
--- distance=3 for A part_of D | |||
distance integer, | |||
--- @@graph_path.relation_distance | |||
--- (added 2008-10-27) | |||
--- The distance in terms of the number of "hops" over | |||
--- relationship_type_id in the asserted graph (term2term). | |||
--- Example: if A part_of B is_a C part_of D, then | |||
--- relation_distance=2 for A part_of D | |||
relation_distance integer | |||
=== Using the closure === | === Using the closure === | ||
Line 76: | Line 104: | ||
Currently the blind transitive closure is calculated using perl code in go-db-perl | Currently the blind transitive closure is calculated using perl code in go-db-perl | ||
This code will be retired - | This code will be retired. There are 2 options for replacement: | ||
* Use the [[OBO-Edit:Reasoner|OBO Edit Reasoner]] | |||
* Use custom perl/SQL code | |||
Using the OE reasoner has various advantages - we can leverage existing code. Also, the the OE reasoner can wrap standard 3rd party reasoners such as Pellet. | |||
See [[:Category:Reasoning]] for more details. | |||
The database would be populated by first running [[obo2linkfile]] on the main .obo file to generate the tab-del file above. A new loader script (load-linkfile-into-graph_path.pl) has been written to pull this into the database | |||
(alternatively, OE can write directly to the database) | |||
However, on balance it is likely we will use a lightweight perl/SQL approach. See the script [http://geneontology.cvs.sourceforge.net/viewvc/*checkout*/geneontology/go-dev/go-db-perl/scripts/go-db-reasoner.pl go-db-reasoner.pl] in [http://geneontology.cvs.sourceforge.net/viewvc/*checkout*/geneontology/go-dev/go-db-perl/scripts/ go-db-perl/scripts] | |||
This has certain advantages: | |||
* no need for OE configuration in production pipeline | |||
* reasoner can easily be run on existing state of database | |||
* scales with disk space, not memory | |||
== Basic Usage == | == Basic Usage == | ||
Line 95: | Line 137: | ||
== Advanced Usage == | == Advanced Usage == | ||
See the [[Relation composition]] page | |||
[[Category:Relations]] | [[Category:Relations]] | ||
[[Category:Reasoning]] | [[Category:Reasoning]] |
Revision as of 13:55, 2 June 2010
Calculating the transitive closure: the old way
See Schema notes
The old way was to ignore the relation in the GO graph and calculate a blind transitive closure. This works fine so long as GO consists of only a few relations like is_a and part_of. However, it results in false positives when used with regulates. These are not so serious as we had previously been treating regulates as part_of anyway. However, the relation does need to be taken into account for other relations, especially has_part
Calculating the transitive closure: the new way
The new way will take relations and semantics of those relations into account. See Category:Reasoning. This may more accurately be called the deductive closure
3rd party consumers of the GO will have the option of calculating the closure themselves, or using a pre-computed closure. The pre-computed closure will be available as:
- A simple tab-delimited file
- in the graph_path table in the database
There are many advantages to the new system
- will not make erroneous calculations for relations such as has_part
- treats regulates correctly
- scales with new relations
Tab delimited file
The columns are
- subject GO ID (i.e. child)
- target GO ID (i.e. parent)
- relation ID (e.g. part_of)
- implied or asserted
An example of this file can be found here.
This table can be used to determine the relation between any two nodes in the GO (if a relation holds at all)
The table is generated using obo2linkfile (part of the core OBOEdit2 distribution)
graph_path table
The graph_path table has been extended to include the relation:
--- @@ graph_path.relationship_type_id --- References an entry in the term table corresponding --- to the INFERRED relation that holds between term2 and term1. --- At this time the value is always NULL - a blind transitive closure --- is calculated, ignoring the relationship_type_id in term2term. --- However, in future we want to calculate different closures for --- different relations. [See --- ] relationship_type_id integer, foreign key (relationship_type_id) references term(id),
This brings the graph_path table more in line with cvtermpath in Chado
A new column relation_distance has been added. The current distance column will remain, and have the same semantics (i.e. number of hops to get from a node (term2) to its descendant (term1), regardless of relation. The new relation_distance column measures the number of hops over the specified relation only.
From the DDL docs:
--- @@graph_path.distance --- The distance in terms of the number of "hops" between --- nodes in the asserted graph (term2term). --- The relationship_type_id is ignored here. --- Example: if A part_of B is_a C part_of D, then --- distance=3 for A part_of D distance integer,
--- @@graph_path.relation_distance --- (added 2008-10-27) --- The distance in terms of the number of "hops" over --- relationship_type_id in the asserted graph (term2term). --- Example: if A part_of B is_a C part_of D, then --- relation_distance=2 for A part_of D relation_distance integer
Using the closure
TDB with ontology/annotation group
There is an implicit relation between a gene product and a GO term. This has yet to be formalized, an initial sketch is below:
- has_function : for MF
- has_function_in_process : for BP
- has_function_in_location : for CC
Whilst this has yet to be finalized, what is clear is that the BP and CC relations are transitive_over part_of. This means it is valid to propagate the link up both is_a and part_of links.
e.g.
G has_function_in_process A A is_a B B part_of C C is_a D D regulates E E is_a F => A part_of C A part_of D => G has_function_in_process D
When the exact relations are determined we will provide a relation composition table - given two relations, R1 and R2, what do we know about the composition R1 o R2?
How it's calculated
Currently the blind transitive closure is calculated using perl code in go-db-perl
This code will be retired. There are 2 options for replacement:
- Use the OBO Edit Reasoner
- Use custom perl/SQL code
Using the OE reasoner has various advantages - we can leverage existing code. Also, the the OE reasoner can wrap standard 3rd party reasoners such as Pellet. See Category:Reasoning for more details.
The database would be populated by first running obo2linkfile on the main .obo file to generate the tab-del file above. A new loader script (load-linkfile-into-graph_path.pl) has been written to pull this into the database
(alternatively, OE can write directly to the database)
However, on balance it is likely we will use a lightweight perl/SQL approach. See the script go-db-reasoner.pl in go-db-perl/scripts
This has certain advantages:
- no need for OE configuration in production pipeline
- reasoner can easily be run on existing state of database
- scales with disk space, not memory
Basic Usage
Bcl2 is annotated to positive regulation of anti-apoptosis (GO:0045768). What is the relation between this term and apoptosis (GO:0006915)?
If we grep the table here (or query the graph_path table in the database) we see:
GO:0045768 indirectly_regulates GO:0006915 implied link GO:0045768 indirectly_negatively_regulates GO:0006915 implied link
Advanced Usage
See the Relation composition page