GAF Taxonomy Reasoning

From GO Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Using the GO-taxon links, we can infer when an annotation is incorrect.

Report

Rules

To determine if an only_in constraint is valid, annotations are propagated up the GO DAG over is_a and part_of, across the only_in link, and down the taxonomy hierarchy. If a path from a term to a taxon does not exist, then this term cannot be applied to this taxon.

Formally:

  • an annotation <G,C> is invalid if:
    • G has-taxon T
    • (C is_a C' OR C part_of C') AND
    • C' only_in T' AND
    • NOT(T is_a T')

In RO is_a and part_of are both transitive and reflexive (the reflexivity case accounts for when T and T' are identical)

We also use the obo-format rules for union_of

Note that currently we don't do propagation over regulates - inter-species regulation may be valid

Report Format

Each line is in two parts. The part before the :: separator is the error report. The part after is the GAF line, copied verbatim.

The report part before the delimiter is

 <GO ID> "<Term name>" only_in <TAX ID> "<Taxon name>"

This is not ideal, as GO ID is redundant with col 5 of the GAF, which is repeated after the ::

ideally the report part would show the ID that is directly linked to the taxon - but it doesn't do this yet.

Example line:

 GO:0007595 "lactation" only_in NCBITaxon:40674 "Mammalia" :: Ensembl    ENSGALP00000029396      ENSGALP00000029396              GO:0007595      GO_REF:0000002  IEA     InterPro:IPR003626      P                       protein NCBITaxon:9

031 20091214 UniProtKB

This IEA from Chicken is erroneously associates a protein with "lactation", which is restricted to mammals.

TODO

viruses

Implementation

The script go-gaf-inference.pl is distributed as part pf GO Moose.

It requires 4 inputs:

  • a GO obo file
  • a taxon links obo file
  • a taxonomy obo file (may be slim)
  • a GAF