Taxon-GO Implementation April 2008 onwards: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 18: Line 18:
   
   
  grep -v IEA z | cut -f1 | sort -u | perl -npe 's//NCBITaxon:/' > ~/ tmp/tax-ids.txt  
  grep -v IEA z | cut -f1 | sort -u | perl -npe 's//NCBITaxon:/' > ~/ tmp/tax-ids.txt  
 
  (there were almost a 1000!)
  (there were almost a 1000!)
   
   

Revision as of 10:53, 29 April 2008

At the Consortium meetings in Princeton in 2007 and Salt Lake City in 2008 Jennifer presented a proposal and pilot on the system of implementing taxon information. At the Salt Lake City Meeting it was decided to implement the proposal. This page is for recording of progress on that implementation.


The original proposal is not currently archived.
The pilot data is at
/go/meeting/minutes/20080420_Additional_Material/GO-Taxon_Links_Report.ppt


29th April 2008

In starting to implement the links I am using the custom taxon slim that Chris Mungall made from the NCBI taxonomy hierarchy.

This is what he did to make the slim:


I grabbed all species with an annotation in the database, then did a  simple filter on the results:
http://wiki.geneontology.org/index.php/ Example_Queries#Total_annotations.2C_grouped_by_species. 2C_broken_down_by_evidence

grep -v IEA z | cut -f1 | sort -u | perl -npe 's//NCBITaxon:/' > ~/ tmp/tax-ids.txt 

(there were almost a 1000!)

I then used my segmentation tool (part of obol) to slice these IDs  and their descendants from the ncbi tax file I publish on the   
obo  download page. The results are in:

http://www.berkeleybop.org/obol/tmp/ncbitax-slim.obo

there's a bug in my segmenter in that the ranks (genus, order,  family) were not included. But this may work to your advantage in    
that these are stored using generic term properties which people  aren't used to yet. It seems like you don't need these anyway.

I am to reproduce my segmenter functionality in OE. In fact it may be  possible to do this right now with filter scripts. In this   
particular  case the segmenter is doing something pretty basic - following all  input terms up to the root and writing as .obo