Transitive closure

From GO Wiki
Jump to navigation Jump to search

Calculating the transitive closure: the old way

See Schema notes

The old way was to ignore the relation in the GO graph and calculate a blind transitive closure. This works fine so long as GO consists of only a few relations like is_a and part_of. However, it results in false positives when used with regulates. These are not so serious as we had previously been treating regulates as part_of anyway. However, the relation does need to be taken into account for other relations, escpecially has_part

Calculating the transitive closure: the new way

The new way will take relations and semantics of those relations into account. See Category:Reasoning. This may more accurately be called the deductive closure

3rd party consumers of the GO will have the option of calculating the closure themselves, or using a pre-computed closure. The pre-computed closure will be available as:

  1. A simple tab-delimited file
  2. in the graph_path table in the database

Tab delimited file

This will mostly likely contain:

  1. subject GO ID (i.e. child)
  2. target GO ID (i.e. parent)
  3. relation ID (e.g. part_of)
  4. distance

graph_path table

The graph_path table is being extended to include the relation:

       --- @@ graph_path.relationship_type_id
       --- References an entry in the term table corresponding
       --- to the INFERRED relation that holds between term2 and term1.
       --- At this time the value is always NULL - a blind transitive closure
       --- is calculated, ignoring the relationship_type_id in term2term.
       --- However, in future we want to calculate different closures for
       --- different relations. [See
       --- ]
       relationship_type_id integer,
       foreign key (relationship_type_id) references term(id),

Using the closure

TDB with ontology/annotation group

There is an implicit relation between a gene product and a GO term. This has yet to be formalized, an initial sketch is below:

  • has_function : for MF
  • has_function_in_process : for BP
  • has_function_in_location : for CC

Whilst this has yet to be finalized, what is clear is that the BP and CC relations are transitive_over part_of. This means it is valid to propagate the link up both is_a and part_of links.

e.g.

 G has_function_in_process A
 A is_a B
 B part_of C
 C is_a D
 D regulates E
 E is_a F
 =>
 A part_of C
 A part_of D
 =>
 G has_function_in_process D


How it's calculated

Currently the blind transitive closure is calculated using perl code in go-db-perl

This code will be retired - instead we will use a reasoner; most likely the OboEdit reasoner. See the OE reasoner paper (Category:Reasoning - the OE reasoner has the option of wrapping other standard 3rd party reasoners

The database will be populated by first running obo2obo on the main .obo file to generate the tab-del file above. A new loader script will be written to pull this into the database

(alternatively, OE can write directly to the database; we will probably go with the more loosely coupled approach at first)