Transitive closure

From GO Wiki
Jump to navigation Jump to search

Calculating the transitive closure: the old way

See Schema notes

The old way was to ignore the relation in the GO graph and calculate a blind transitive closure. This works fine so long as GO consists of only a few relations like is_a and part_of. However, it results in false positives when used with regulates. These are not so serious as we had previously been treating regulates as part_of anyway. However, the relation does need to be taken into account for other relations, escpecially has_part

Calculating the transitive closure: the new way

The new way will take relations and semantics of those relations into account. See Category:Reasoning. This may more accurately be called the deductive closure

3rd party consumers of the GO will have the option of calculating the closure themselves, or using a pre-computed closure. The pre-computed closure will be available as:

  1. A simple tab-delimited file
  2. in the graph_path table in the database

Tab delimited file

The columns are

  1. subject GO ID (i.e. child)
  2. target GO ID (i.e. parent)
  3. relation ID (e.g. part_of)
  4. implied or asserted

An example of this file can be found here.

This table can be used to determine the relation between any two nodes in the GO (if a relation holds at all)

The table is generated using obo2linkfile (part of the core OBOEdit2 distribution)

graph_path table

The graph_path table is being extended to include the relation:

       --- @@ graph_path.relationship_type_id
       --- References an entry in the term table corresponding
       --- to the INFERRED relation that holds between term2 and term1.
       --- At this time the value is always NULL - a blind transitive closure
       --- is calculated, ignoring the relationship_type_id in term2term.
       --- However, in future we want to calculate different closures for
       --- different relations. [See
       --- ]
       relationship_type_id integer,
       foreign key (relationship_type_id) references term(id),

Using the closure

TDB with ontology/annotation group

There is an implicit relation between a gene product and a GO term. This has yet to be formalized, an initial sketch is below:

  • has_function : for MF
  • has_function_in_process : for BP
  • has_function_in_location : for CC

Whilst this has yet to be finalized, what is clear is that the BP and CC relations are transitive_over part_of. This means it is valid to propagate the link up both is_a and part_of links.

e.g.

 G has_function_in_process A
 A is_a B
 B part_of C
 C is_a D
 D regulates E
 E is_a F
 =>
 A part_of C
 A part_of D
 =>
 G has_function_in_process D

When the exact relations are determined we will provide a relation composition table - given two relations, R1 and R2, what do we know about the composition R1 o R2?

How it's calculated

Currently the blind transitive closure is calculated using perl code in go-db-perl

This code will be retired - instead we will use a reasoner; most likely the OboEdit reasoner. See the OE reasoner paper (Category:Reasoning - the OE reasoner has the option of wrapping other standard 3rd party reasoners

The database will be populated by first running obo2obo on the main .obo file to generate the tab-del file above. A new loader script will be written to pull this into the database

(alternatively, OE can write directly to the database; we will probably go with the more loosely coupled approach at first)