GAF Taxonomy Reasoning: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
mNo edit summary
 
(8 intermediate revisions by 2 users not shown)
Line 20: Line 20:


We also use the obo-format rules for [[union_of]]
We also use the obo-format rules for [[union_of]]
Note that currently we don't do propagation over regulates - inter-species regulation may be valid


== Report Format ==
== Report Format ==
Line 40: Line 42:
This IEA from Chicken is erroneously associates a protein with "lactation", which is restricted to mammals.
This IEA from Chicken is erroneously associates a protein with "lactation", which is restricted to mammals.


== TODO ==
viruses


== Implementation ==
== Implementation ==
Line 45: Line 50:
The script [http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/bin/go-gaf-inference.pl go-gaf-inference.pl] is distributed as part pf [[GO Moose]].
The script [http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/bin/go-gaf-inference.pl go-gaf-inference.pl] is distributed as part pf [[GO Moose]].


It requires 4 inputs:
* a GO obo file
* a taxon links obo file
* a taxonomy obo file (may be slim)
* a GAF


[[Category:Taxon]]
[[Category:Taxon]]
[[Category:GAF]]
[[Category:Reasoning]]
[[Category:Reasoning]]

Latest revision as of 13:03, 22 March 2019

Using the GO-taxon links, we can infer when an annotation is incorrect.

Report

Rules

To determine if an only_in constraint is valid, annotations are propagated up the GO DAG over is_a and part_of, across the only_in link, and down the taxonomy hierarchy. If a path from a term to a taxon does not exist, then this term cannot be applied to this taxon.

Formally:

  • an annotation <G,C> is invalid if:
    • G has-taxon T
    • (C is_a C' OR C part_of C') AND
    • C' only_in T' AND
    • NOT(T is_a T')

In RO is_a and part_of are both transitive and reflexive (the reflexivity case accounts for when T and T' are identical)

We also use the obo-format rules for union_of

Note that currently we don't do propagation over regulates - inter-species regulation may be valid

Report Format

Each line is in two parts. The part before the :: separator is the error report. The part after is the GAF line, copied verbatim.

The report part before the delimiter is

 <GO ID> "<Term name>" only_in <TAX ID> "<Taxon name>"

This is not ideal, as GO ID is redundant with col 5 of the GAF, which is repeated after the ::

ideally the report part would show the ID that is directly linked to the taxon - but it doesn't do this yet.

Example line:

 GO:0007595 "lactation" only_in NCBITaxon:40674 "Mammalia" :: Ensembl    ENSGALP00000029396      ENSGALP00000029396              GO:0007595      GO_REF:0000002  IEA     InterPro:IPR003626      P                       protein NCBITaxon:9

031 20091214 UniProtKB

This IEA from Chicken is erroneously associates a protein with "lactation", which is restricted to mammals.

TODO

viruses

Implementation

The script go-gaf-inference.pl is distributed as part pf GO Moose.

It requires 4 inputs:

  • a GO obo file
  • a taxon links obo file
  • a taxonomy obo file (may be slim)
  • a GAF