Transitive closure: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
m (Spelling/Typo correction)
(9 intermediate revisions by one other user not shown)
Line 3: Line 3:
See [http://www.geneontology.org/GO.database.shtml#schema_notes Schema notes]
See [http://www.geneontology.org/GO.database.shtml#schema_notes Schema notes]


The old way was to ''ignore the relation'' in the GO graph and calculate a '''blind''' transitive closure. This works fine so long as GO consists of only a few relations like [[is_a]] and [[part_of]]. However, it results in false positives when used with [[regulates]]. These are not so serious as we had previously been treating regulates as part_of anyway. However, the relation does need to be taken into account for other relations, escpecially [[has_part]]
The old way was to ''ignore the relation'' in the GO graph and calculate a '''blind''' transitive closure. This works fine so long as GO consists of only a few relations like [[is_a]] and [[part_of]]. However, it results in false positives when used with [[regulates]]. These are not so serious as we had previously been treating regulates as part_of anyway. However, the relation does need to be taken into account for other relations, especially [[has_part]]


== Calculating the transitive closure: the new way ==
== Calculating the transitive closure: the new way ==
Line 13: Line 13:
# A simple tab-delimited file
# A simple tab-delimited file
# in the graph_path table in the database
# in the graph_path table in the database
There are many advantages to the new system
* will not make erroneous calculations for relations such as [[has_part]]
* treats [[regulates]] correctly
* scales with new relations


=== Tab delimited file ===
=== Tab delimited file ===
Line 31: Line 37:
=== graph_path table ===
=== graph_path table ===


The [http://www.geneontology.org/GO.database.schema.shtml#go-optimisations.table.graph-path graph_path] table is being extended to include the relation:
The [http://www.geneontology.org/GO.database.schema.shtml#go-optimisations.table.graph-path graph_path] table has been extended to include the relation:


         --- @@ graph_path.relationship_type_id
         --- @@ graph_path.relationship_type_id
Line 43: Line 49:
         relationship_type_id integer,
         relationship_type_id integer,
         foreign key (relationship_type_id) references term(id),
         foreign key (relationship_type_id) references term(id),
This brings the graph_path table more in line with cvtermpath in Chado
A new column ''relation_distance'' has been added. The current ''distance'' column will remain, and have the same semantics (i.e. number of hops to get from a node (term2) to its descendant (term1), regardless of relation. The new relation_distance column measures the number of hops over the specified relation only.
From the DDL docs:
      --- @@graph_path.distance
        --- The distance in terms of the number of "hops" between
        --- nodes in the asserted graph (term2term).
        --- The relationship_type_id is ignored here.
        --- Example: if A part_of B is_a C part_of D, then
        --- distance=3 for A part_of D
        distance        integer,
        --- @@graph_path.relation_distance
        --- (added 2008-10-27)
        --- The distance in terms of the number of "hops" over
        --- relationship_type_id in the asserted graph (term2term).
        --- Example: if A part_of B is_a C part_of D, then
        --- relation_distance=2 for A part_of D
        relation_distance        integer


=== Using the closure ===
=== Using the closure ===
Line 76: Line 104:
Currently the blind transitive closure is calculated using perl code in go-db-perl
Currently the blind transitive closure is calculated using perl code in go-db-perl


This code will be retired - instead we will use a reasoner; most likely the OboEdit reasoner. See the OE reasoner paper ([[:Category:Reasoning]] - the OE reasoner has the option of wrapping other standard 3rd party reasoners
This code will be retired. There are 2 options for replacement:
 
* Use the [[OBO-Edit:Reasoner|OBO Edit Reasoner]]
* Use custom perl/SQL code
 
Using the OE reasoner has various advantages - we can leverage existing code. Also, the the OE reasoner can wrap standard 3rd party reasoners such as Pellet.
See [[:Category:Reasoning]] for more details.
 
The database would be populated by first running [[obo2linkfile]] on the main .obo file to generate the tab-del file above. A new loader script (load-linkfile-into-graph_path.pl) has been written to pull this into the database
 
(alternatively, OE can write directly to the database)
 
However, on balance it is likely we will use a lightweight perl/SQL approach. See the script [http://geneontology.cvs.sourceforge.net/viewvc/*checkout*/geneontology/go-dev/go-db-perl/scripts/go-db-reasoner.pl go-db-reasoner.pl] in [http://geneontology.cvs.sourceforge.net/viewvc/*checkout*/geneontology/go-dev/go-db-perl/scripts/ go-db-perl/scripts]


The database will be populated by first running obo2linkfile on the main .obo file to generate the tab-del file above. A new loader script will be written to pull this into the database
This has certain advantages:


(alternatively, OE can write directly to the database; we will probably go with the more loosely coupled approach at first)
* no need for OE configuration in production pipeline
* reasoner can easily be run on existing state of database
* scales with disk space, not memory


== Basic Usage ==
== Basic Usage ==
Line 95: Line 137:
== Advanced Usage ==
== Advanced Usage ==


Bcl2
See the [[Relation composition]] page


[[Category:Relations]]
[[Category:Relations]]


[[Category:Reasoning]]
[[Category:Reasoning]]

Revision as of 13:55, 2 June 2010

Calculating the transitive closure: the old way

See Schema notes

The old way was to ignore the relation in the GO graph and calculate a blind transitive closure. This works fine so long as GO consists of only a few relations like is_a and part_of. However, it results in false positives when used with regulates. These are not so serious as we had previously been treating regulates as part_of anyway. However, the relation does need to be taken into account for other relations, especially has_part

Calculating the transitive closure: the new way

The new way will take relations and semantics of those relations into account. See Category:Reasoning. This may more accurately be called the deductive closure

3rd party consumers of the GO will have the option of calculating the closure themselves, or using a pre-computed closure. The pre-computed closure will be available as:

  1. A simple tab-delimited file
  2. in the graph_path table in the database

There are many advantages to the new system

  • will not make erroneous calculations for relations such as has_part
  • treats regulates correctly
  • scales with new relations

Tab delimited file

The columns are

  1. subject GO ID (i.e. child)
  2. target GO ID (i.e. parent)
  3. relation ID (e.g. part_of)
  4. implied or asserted

An example of this file can be found here.

This table can be used to determine the relation between any two nodes in the GO (if a relation holds at all)

The table is generated using obo2linkfile (part of the core OBOEdit2 distribution)

graph_path table

The graph_path table has been extended to include the relation:

       --- @@ graph_path.relationship_type_id
       --- References an entry in the term table corresponding
       --- to the INFERRED relation that holds between term2 and term1.
       --- At this time the value is always NULL - a blind transitive closure
       --- is calculated, ignoring the relationship_type_id in term2term.
       --- However, in future we want to calculate different closures for
       --- different relations. [See
       --- ]
       relationship_type_id integer,
       foreign key (relationship_type_id) references term(id),

This brings the graph_path table more in line with cvtermpath in Chado

A new column relation_distance has been added. The current distance column will remain, and have the same semantics (i.e. number of hops to get from a node (term2) to its descendant (term1), regardless of relation. The new relation_distance column measures the number of hops over the specified relation only.

From the DDL docs:

      --- @@graph_path.distance
       --- The distance in terms of the number of "hops" between
       --- nodes in the asserted graph (term2term).
       --- The relationship_type_id is ignored here.
       --- Example: if A part_of B is_a C part_of D, then
       --- distance=3 for A part_of D
       distance        integer,
       --- @@graph_path.relation_distance
       --- (added 2008-10-27)
       --- The distance in terms of the number of "hops" over
       --- relationship_type_id in the asserted graph (term2term).
       --- Example: if A part_of B is_a C part_of D, then
       --- relation_distance=2 for A part_of D
       relation_distance        integer

Using the closure

TDB with ontology/annotation group

There is an implicit relation between a gene product and a GO term. This has yet to be formalized, an initial sketch is below:

  • has_function : for MF
  • has_function_in_process : for BP
  • has_function_in_location : for CC

Whilst this has yet to be finalized, what is clear is that the BP and CC relations are transitive_over part_of. This means it is valid to propagate the link up both is_a and part_of links.

e.g.

 G has_function_in_process A
 A is_a B
 B part_of C
 C is_a D
 D regulates E
 E is_a F
 =>
 A part_of C
 A part_of D
 =>
 G has_function_in_process D

When the exact relations are determined we will provide a relation composition table - given two relations, R1 and R2, what do we know about the composition R1 o R2?

How it's calculated

Currently the blind transitive closure is calculated using perl code in go-db-perl

This code will be retired. There are 2 options for replacement:

Using the OE reasoner has various advantages - we can leverage existing code. Also, the the OE reasoner can wrap standard 3rd party reasoners such as Pellet. See Category:Reasoning for more details.

The database would be populated by first running obo2linkfile on the main .obo file to generate the tab-del file above. A new loader script (load-linkfile-into-graph_path.pl) has been written to pull this into the database

(alternatively, OE can write directly to the database)

However, on balance it is likely we will use a lightweight perl/SQL approach. See the script go-db-reasoner.pl in go-db-perl/scripts

This has certain advantages:

  • no need for OE configuration in production pipeline
  • reasoner can easily be run on existing state of database
  • scales with disk space, not memory

Basic Usage

Bcl2 is annotated to positive regulation of anti-apoptosis (GO:0045768). What is the relation between this term and apoptosis (GO:0006915)?

If we grep the table here (or query the graph_path table in the database) we see:

GO:0045768      indirectly_regulates    GO:0006915      implied link
GO:0045768      indirectly_negatively_regulates GO:0006915      implied link

Advanced Usage

See the Relation composition page