Relevant for page

From GO Wiki
Jump to: navigation, search


First Prototype

This first attempt shows the relevant-to links embedded in the onology file as subsets.

To give an idea of how this system might work I have mined some of the annotations to provide relevant_for links.

I checked out the gene_ontology_edit.obo file and the FlyBase and TAIR gene_association files on 6th August 2007.


1) I extracted the list of GO terms used for annotation from the Drosophila and Arabidopsis gene_association files.
2) I inserted lines in the GO file to make the terms that had been used in Drosophila or Arabidopsis annotation.
(We do not currently have a setup for me to do this as a separate file so I just made a subset for each. They are called 'relevant_for Arabidopsis' and relevant_for Drosophila'.)
3) I loaded the file into OBO-Edit and set up global renders so that the term used for Arabidopsis annotation would show in brown bold text, the terms used for Drosophila annotation would show in blue bold text, and the terms used for both would show in black bold text. There are images below of these renders.

Bluerender.png

Brownrender.png

Blackrender.png

4) I then examined the graph to see how the terms appear. I have taken a screenshot below to show some of the commonly used Drosophila and plant terms, to give an idea of how much easier it is to spot the terms that the curators want. This also shows that even with the render on the non-plant and non-Drosophila terms are always visible too.

Generalgraph.png


This image below shows one of the first errors that I was able to spot using this system. Here a term used for fruitfly annotation has been accidentally moved to be a child of a plant term covering plant-specific development. Note that such error checking is only possible with manual insertion of the specific_to links. In this case 'organ boundary specification between lateral organs and the meristem' is specific to Viridiplantae.

Error.png


The GO file used here is checked into the scratch directory of cvs: http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/scratch/TaxonSubsetFile.obo

Discussion: http://gocwiki.geneontology.org/index.php/Talk:Relevant_for_page


Second Prototype

This attempt shows how we can have the relevant_to links in a separate file as a mapping with a further separate taxonomy slim.

Requirements:

NCBI taxonomy slim file in OBO format

A copy of the ncbi taxonomy hierarchy, slimmed, kept in sync with the original, and in obo format. I have asked Chris Mungall if the OBO Foundry could provide this. The graph would just need the cellular organisms and viruses, as far down as each of our currently annotated species.

For this prototype I am using the a taxon graph that I made long-hand. It is in cvs as go/scratch/Taxonomy.obo. It is in OBO format as follows:

[Term]
id: taxon:3702
name: Arabidopsis thaliana
namespace: Taxonomy
is_a: BR:0000181 ! Viridiplantae
[Term


A copy of the live ontology file

I have taken a branch copy and it is in cvs as go/scratch/TaxonSubsetFile.obo

This file still contains the subset lines that I added in attempt 1 as I am using the same file for both.

[Term]
id: GO:0051791
name: medium-chain fatty acid metabolic process
namespace: biological_process
def: "The chemical reactions and pathways involving medium-chain fatty acids,
aliphatic compounds having a terminal  carboxyl group and with a chain length of C8-12." [GOC:go_curators]
synonym: "medium chain fatty acid metabolic process" EXACT []
synonym: "medium chain fatty acid metabolism" EXACT []
synonym: "medium-chain fatty acid metabolism" EXACT []
is_a: GO:0006631 ! fatty acid metabolic process
[Term]

A copy of the cross product file

This is file that only records the relationships between taxa and GO terms. The file is in OBO format as suggested by John Day-Richter and it is as follows:


[Term]
id: GO:0051791
namespace: Taxon-GO
relevant_to: taxon:3702 ! Arabidopsis thaliana
[Term] id: GO:0051864 namespace: Taxon-GO relevant_to: taxon:7227 ! Drosophila melanogaster


Tools

I can load these into OBO-Edit but I have not worked out how to visualise the links.

I tried using the flower button to show only relevan_to links but I got a lot of orphans that did not have relevant_to links. When I turned the flower button off many orphans stayed even when I restarted the application and deleted the .oboedit directory.

These are the orphans after the flower button is off:

Roots.png


These are the settings on the relevant_to relationship:

Relevant.png


Questions on Logic

I sent (more or less) this message to Wacek:

Hi Waclaw,
I have just been working on some real examples of connecting taxon and GO with your relationships and I have a question.
We have two complimentary sets of terms, compound eye development and its descendents for the flies, and camera-type eye development its descendents for the mammals and birds and lizards and so on. It's easy to label the fly term as I just make compound eye development and its descendents 'specific to Insecta (or whatever the broadest compound eyed organism grouping is)'. However it's much harder with the camera-type eye development term as it's specific to everything except Insecta (or whatever the broadest compound eyed organism grouping is).
It would be the same with gametogenesis. We have the kind of gametogenesis in plants and the kind of gametogenesis in pretty much everything else. I understand how to label the small divergent grouping with the specific_to tag, but how do I label the other complimentary group? Should it be 'specific to not-plants'? I'm guessing not.
Thanks,
Jen

The reason why this is important is the one that Chris stated above. For annotation and ontology structure checking we need to compare the relevant_to links derived from annotations with the specific_to links manually added by curators. It won't work so well if we can't label both the specific_to grouping and its compliment.

Jen