Taxon-GO Implementation April 2008 onwards: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 18: | Line 18: | ||
grep -v IEA z | cut -f1 | sort -u | perl -npe 's//NCBITaxon:/' > ~/ tmp/tax-ids.txt | grep -v IEA z | cut -f1 | sort -u | perl -npe 's//NCBITaxon:/' > ~/ tmp/tax-ids.txt | ||
(there were almost a 1000!) | (there were almost a 1000!) | ||
Revision as of 10:53, 29 April 2008
At the Consortium meetings in Princeton in 2007 and Salt Lake City in 2008 Jennifer presented a proposal and pilot on the system of implementing taxon information. At the Salt Lake City Meeting it was decided to implement the proposal. This page is for recording of progress on that implementation.
The original proposal is not currently archived.
The pilot data is at
/go/meeting/minutes/20080420_Additional_Material/GO-Taxon_Links_Report.ppt
29th April 2008
In starting to implement the links I am using the custom taxon slim that Chris Mungall made from the NCBI taxonomy hierarchy.
This is what he did to make the slim:
I grabbed all species with an annotation in the database, then did a simple filter on the results: http://wiki.geneontology.org/index.php/ Example_Queries#Total_annotations.2C_grouped_by_species. 2C_broken_down_by_evidence grep -v IEA z | cut -f1 | sort -u | perl -npe 's//NCBITaxon:/' > ~/ tmp/tax-ids.txt (there were almost a 1000!) I then used my segmentation tool (part of obol) to slice these IDs and their descendants from the ncbi tax file I publish on the obo download page. The results are in: http://www.berkeleybop.org/obol/tmp/ncbitax-slim.obo there's a bug in my segmenter in that the ranks (genus, order, family) were not included. But this may work to your advantage in that these are stored using generic term properties which people aren't used to yet. It seems like you don't need these anyway. I am to reproduce my segmenter functionality in OE. In fact it may be possible to do this right now with filter scripts. In this particular case the segmenter is doing something pretty basic - following all input terms up to the root and writing as .obo