GAF Taxonomy Reasoning: Difference between revisions
No edit summary |
mNo edit summary |
||
(8 intermediate revisions by 2 users not shown) | |||
Line 20: | Line 20: | ||
We also use the obo-format rules for [[union_of]] | We also use the obo-format rules for [[union_of]] | ||
Note that currently we don't do propagation over regulates - inter-species regulation may be valid | |||
== Report Format == | == Report Format == | ||
Line 40: | Line 42: | ||
This IEA from Chicken is erroneously associates a protein with "lactation", which is restricted to mammals. | This IEA from Chicken is erroneously associates a protein with "lactation", which is restricted to mammals. | ||
== TODO == | |||
viruses | |||
== Implementation == | == Implementation == | ||
Line 45: | Line 50: | ||
The script [http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/bin/go-gaf-inference.pl go-gaf-inference.pl] is distributed as part pf [[GO Moose]]. | The script [http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/bin/go-gaf-inference.pl go-gaf-inference.pl] is distributed as part pf [[GO Moose]]. | ||
It requires 4 inputs: | |||
* a GO obo file | |||
* a taxon links obo file | |||
* a taxonomy obo file (may be slim) | |||
* a GAF | |||
[[Category:Taxon]] | [[Category:Taxon]] | ||
[[Category:Reasoning]] | [[Category:Reasoning]] |
Latest revision as of 13:03, 22 March 2019
Using the GO-taxon links, we can infer when an annotation is incorrect.
Report
Rules
To determine if an only_in constraint is valid, annotations are propagated up the GO DAG over is_a and part_of, across the only_in link, and down the taxonomy hierarchy. If a path from a term to a taxon does not exist, then this term cannot be applied to this taxon.
Formally:
- an annotation <G,C> is invalid if:
- G has-taxon T
- (C is_a C' OR C part_of C') AND
- C' only_in T' AND
- NOT(T is_a T')
In RO is_a and part_of are both transitive and reflexive (the reflexivity case accounts for when T and T' are identical)
We also use the obo-format rules for union_of
Note that currently we don't do propagation over regulates - inter-species regulation may be valid
Report Format
Each line is in two parts. The part before the :: separator is the error report. The part after is the GAF line, copied verbatim.
The report part before the delimiter is
<GO ID> "<Term name>" only_in <TAX ID> "<Taxon name>"
This is not ideal, as GO ID is redundant with col 5 of the GAF, which is repeated after the ::
ideally the report part would show the ID that is directly linked to the taxon - but it doesn't do this yet.
Example line:
GO:0007595 "lactation" only_in NCBITaxon:40674 "Mammalia" :: Ensembl ENSGALP00000029396 ENSGALP00000029396 GO:0007595 GO_REF:0000002 IEA InterPro:IPR003626 P protein NCBITaxon:9
031 20091214 UniProtKB
This IEA from Chicken is erroneously associates a protein with "lactation", which is restricted to mammals.
TODO
viruses
Implementation
The script go-gaf-inference.pl is distributed as part pf GO Moose.
It requires 4 inputs:
- a GO obo file
- a taxon links obo file
- a taxonomy obo file (may be slim)
- a GAF