Transitive closure

From GO Wiki
Jump to: navigation, search

Calculating the transitive closure: the old way

See Schema notes

The old way was to ignore the relation in the GO graph and calculate a blind transitive closure. This works fine so long as GO consists of only a few relations like is_a and part_of. However, it results in false positives when used with regulates. These are not so serious as we had previously been treating regulates as part_of anyway. However, the relation does need to be taken into account for other relations, especially has_part

Calculating the transitive closure: the new way

The new way will take relations and semantics of those relations into account. See Category:Reasoning. This may more accurately be called the deductive closure

3rd party consumers of the GO will have the option of calculating the closure themselves, or using a pre-computed closure. The pre-computed closure will be available as:

  1. A simple tab-delimited file
  2. in the graph_path table in the database

There are many advantages to the new system

  • will not make erroneous calculations for relations such as has_part
  • treats regulates correctly
  • scales with new relations

Tab delimited file

The columns are

  1. subject GO ID (i.e. child)
  2. target GO ID (i.e. parent)
  3. relation ID (e.g. part_of)
  4. implied or asserted

An example of this file can be found here.

This table can be used to determine the relation between any two nodes in the GO (if a relation holds at all)

The table is generated using obo2linkfile (part of the core OBOEdit2 distribution)

graph_path table

The graph_path table has been extended to include the relation:

       --- @@ graph_path.relationship_type_id
       --- References an entry in the term table corresponding
       --- to the INFERRED relation that holds between term2 and term1.
       --- At this time the value is always NULL - a blind transitive closure
       --- is calculated, ignoring the relationship_type_id in term2term.
       --- However, in future we want to calculate different closures for
       --- different relations. [See
       --- ]
       relationship_type_id integer,
       foreign key (relationship_type_id) references term(id),

This brings the graph_path table more in line with cvtermpath in Chado

A new column relation_distance has been added. The current distance column will remain, and have the same semantics (i.e. number of hops to get from a node (term2) to its descendant (term1), regardless of relation. The new relation_distance column measures the number of hops over the specified relation only.

From the DDL docs:

      --- @@graph_path.distance
       --- The distance in terms of the number of "hops" between
       --- nodes in the asserted graph (term2term).
       --- The relationship_type_id is ignored here.
       --- Example: if A part_of B is_a C part_of D, then
       --- distance=3 for A part_of D
       distance        integer,
       --- @@graph_path.relation_distance
       --- (added 2008-10-27)
       --- The distance in terms of the number of "hops" over
       --- relationship_type_id in the asserted graph (term2term).
       --- Example: if A part_of B is_a C part_of D, then
       --- relation_distance=2 for A part_of D
       relation_distance        integer

Using the closure

TDB with ontology/annotation group

There is an implicit relation between a gene product and a GO term. This has yet to be formalized, an initial sketch is below:

  • has_function : for MF
  • has_function_in_process : for BP
  • has_function_in_location : for CC

Whilst this has yet to be finalized, what is clear is that the BP and CC relations are transitive_over part_of. This means it is valid to propagate the link up both is_a and part_of links.

e.g.

 G has_function_in_process A
 A is_a B
 B part_of C
 C is_a D
 D regulates E
 E is_a F
 =>
 A part_of C
 A part_of D
 =>
 G has_function_in_process D

When the exact relations are determined we will provide a relation composition table - given two relations, R1 and R2, what do we know about the composition R1 o R2?

How it's calculated

Currently the blind transitive closure is calculated using perl code in go-db-perl

This code will be retired. There are 2 options for replacement:

Using the OE reasoner has various advantages - we can leverage existing code. Also, the the OE reasoner can wrap standard 3rd party reasoners such as Pellet. See Category:Reasoning for more details.

The database would be populated by first running obo2linkfile on the main .obo file to generate the tab-del file above. A new loader script (load-linkfile-into-graph_path.pl) has been written to pull this into the database

(alternatively, OE can write directly to the database)

However, on balance it is likely we will use a lightweight perl/SQL approach. See the script go-db-reasoner.pl in go-db-perl/scripts

This has certain advantages:

  • no need for OE configuration in production pipeline
  • reasoner can easily be run on existing state of database
  • scales with disk space, not memory

Basic Usage

Bcl2 is annotated to positive regulation of anti-apoptosis (GO:0045768). What is the relation between this term and apoptosis (GO:0006915)?

If we grep the table here (or query the graph_path table in the database) we see:

GO:0045768      indirectly_regulates    GO:0006915      implied link
GO:0045768      indirectly_negatively_regulates GO:0006915      implied link

Advanced Usage

See the Relation composition page