Taxon-GO Implementation April 2008 onwards
At the Consortium meetings in Princeton in 2007 and Salt Lake City in 2008 Jennifer presented a proposal and pilot on the system of implementing taxon information. At the Salt Lake City Meeting it was decided to implement the proposal. This page is for recording of progress on that implementation.
The original proposal is not currently archived.
The pilot data is at
/go/meeting/minutes/20080420_Additional_Material/GO-Taxon_Links_Report.ppt
29th April 2008
In starting to implement the links I am using the custom taxon slim that Chris Mungall made from the NCBI taxonomy hierarchy.
Taxonomy Slim
This is what he did to make the slim:
I grabbed all species with an annotation in the database, then did a simple filter on the results: http://wiki.geneontology.org/index.php/ Example_Queries#Total_annotations.2C_grouped_by_species. 2C_broken_down_by_evidence grep -v IEA z | cut -f1 | sort -u | perl -npe 's//NCBITaxon:/' > ~/ tmp/tax-ids.txt (there were almost a 1000!) I then used my segmentation tool (part of obol) to slice these IDs and their descendants from the ncbi tax file I publish on the obo download page. The results are in: http://www.berkeleybop.org/obol/tmp/ncbitax-slim.obo there's a bug in my segmenter in that the ranks (genus, order, family) were not included. But this may work to your advantage in that these are stored using generic term properties which people aren't used to yet. It seems like you don't need these anyway. I am to reproduce my segmenter functionality in OE. In fact it may be possible to do this right now with filter scripts. In this particular case the segmenter is doing something pretty basic - following all input terms up to the root and writing as .obo
Q/ Should this slim now be checked into cvs in a non-scratch directory?
Cross Product files
Chris has made files in scratch that show cross products between the go ontology file and the various other ontologies. He has suggested that I should look at the cell type file and categorise the cell types by taxon and then transfer those to the GO file. This will cover far more terms with less work.
The cross product files are at /go/scratch/xps/
I have pulled out the list of cell types to be categorized and it is here:
CL:0000017 ! spermatocyte CL:0000018 ! spermatid CL:0000019 ! sperm CL:0000023 ! oocyte CL:0000025 ! egg CL:0000026 ! nurse cell CL:0000030 ! glioblast CL:0000031 ! neuroblast CL:0000034 ! stem cell CL:0000037 ! hematopoietic stem cell CL:0000056 ! myoblast CL:0000057 ! fibroblast CL:0000062 ! osteoblast CL:0000066 ! epithelial cell CL:0000071 ! blood vessel endothelial cell CL:0000075 ! columnar/cuboidal epithelial cell CL:0000081 ! blood cell CL:0000084 ! T cell CL:0000092 ! osteoclast CL:0000094 ! granulocyte CL:0000097 ! mast cell CL:0000115 ! endothelial cell CL:0000125 ! glial cell CL:0000127 ! astrocyte CL:0000128 ! oligodendrocyte CL:0000129 ! microglial cell CL:0000134 ! mesenchymal cell CL:0000136 ! fat cell CL:0000138 ! chondrocyte CL:0000147 ! pigment cell CL:0000148 ! melanocyte CL:0000150 ! glandular epithelial cell CL:0000178 ! Leydig cell CL:0000187 ! muscle cell CL:0000188 ! skeletal muscle cell CL:0000192 ! smooth muscle cell CL:0000201 ! auditory receptor cell CL:0000202 ! auditory hair cell CL:0000210 ! photoreceptor cell CL:0000216 ! Sertoli cell CL:0000218 ! Schwann cell CL:0000221 ! ectodermal cell CL:0000222 ! mesodermal cell CL:0000223 ! endodermal cell CL:0000228 ! multinucleate cell CL:0000232 ! erythrocyte CL:0000233 ! platelet CL:0000235 ! macrophage CL:0000236 ! B cell CL:0000248 ! microsporocyte CL:0000250 ! megaspore CL:0000252 ! microspore CL:0000253 ! eurydendroid cell CL:0000254 ! egg cell CL:0000262 ! guard mother cell CL:0000276 ! sclerenchyma cell CL:0000280 ! generative cell CL:0000282 ! trichome CL:0000284 ! companion cell CL:0000287 ! eye photoreceptor cell CL:0000288 ! synergid CL:0000292 ! guard cell CL:0000294 ! sieve cell CL:0000295 ! somatotropin secreting cell CL:0000296 ! vegetative cell CL:0000299 ! trichoblast CL:0000300 ! gamete CL:0000301 ! pole cell CL:0000312 ! keratinocyte CL:0000332 ! atrichoblast CL:0000333 ! neural crest cell CL:0000362 ! epidermal cell CL:0000365 ! zygote CL:0000373 ! histoblast CL:0000392 ! crystal cell CL:0000394 ! plasmatocyte CL:0000396 ! lamellocyte CL:0000408 ! male gamete CL:0000430 ! xanthophore CL:0000431 ! iridophore CL:0000439 ! prolactin secreting cell CL:0000442 ! follicular dendritic cell CL:0000448 ! white fat cell CL:0000449 ! brown fat cell CL:0000451 ! dendritic cell CL:0000453 ! Langerhans cell CL:0000467 ! adrenocorticotropic hormone secreting cell CL:0000469 ! ganglion mother cell CL:0000474 ! pericardial cell CL:0000476 ! thyroid stimulating hormone secreting cell CL:0000477 ! follicle cell CL:0000486 ! garland cell CL:0000487 ! oenocyte CL:0000492 ! T-helper cell CL:0000501 ! granulosa cell CL:0000522 ! spore CL:0000537 ! antipodal cell CL:0000540 ! neuron CL:0000542 ! lymphocyte CL:0000545 ! T-helper 1 cell CL:0000546 ! T-helper 2 cell CL:0000556 ! megakaryocyte CL:0000562 ! nucleate erythrocyte CL:0000563 ! endospore CL:0000571 ! leucophore CL:0000573 ! retinal cone cell CL:0000574 ! erythrophore CL:0000576 ! monocyte CL:0000579 ! border follicle cell CL:0000586 ! germ cell CL:0000595 ! enucleate erythrocyte CL:0000598 ! pyramidal cell CL:0000599 ! conidium CL:0000604 ! retinal rod cell CL:0000607 ! ascospore CL:0000608 ! zygospore CL:0000609 ! vestibular hair cell CL:0000615 ! basidiospore CL:0000616 ! sporangiospore CL:0000623 ! natural killer cell CL:0000624 ! CD4-positive, alpha-beta T cell CL:0000625 ! CD8-positive, alpha-beta T cell CL:0000644 ! Bergmann glial cell CL:0000656 ! primary spermatocyte CL:0000668 ! parenchymal cell CL:0000674 ! interfollicle cell CL:0000675 ! female gamete CL:0000681 ! radial glial cell CL:0000695 ! Cajal-Retzius cell CL:0000711 ! cumulus cell CL:0000716 ! lymph gland crystal cell CL:0000722 ! cystoblast CL:0000723 ! somatic stem cell CL:0000724 ! heterocyst CL:0000726 ! chlamydospore CL:0000730 ! leading edge cell CL:0000731 ! urothelial cell CL:0000732 ! amoeboid cell CL:0000733 ! lymph gland plasmatocyte CL:0000735 ! lymph gland hemocyte CL:0000737 ! striated muscle cell CL:0000738 ! leukocyte CL:0000740 ! retinal ganglion cell CL:0000746 ! cardiac muscle cell CL:0000747 ! cyanophore CL:0000748 ! retinal bipolar neuron CL:0000762 ! thrombocyte CL:0000763 ! myeloid cell CL:0000766 ! myeloid leukocyte CL:0000767 ! basophil CL:0000771 ! eosinophil CL:0000775 ! neutrophil CL:0000782 ! myeloid dendritic cell CL:0000784 ! plasmacytoid dendritic cell CL:0000785 ! mature B cell CL:0000786 ! plasma cell CL:0000787 ! memory B cell CL:0000789 ! alpha-beta T cell CL:0000792 ! CD4-positive, CD25-positive, alpha-beta regulatory T cell CL:0000793 ! CD4-positive, alpha-beta intraepithelial T cell CL:0000794 ! CD8-positive, alpha-beta cytotoxic T cell CL:0000795 ! CD8-positive, alpha-beta regulatory T cell CL:0000796 ! CD8 positive, alpha-beta intraepithelial T cell CL:0000797 ! alpha-beta intraepithelial T cell CL:0000798 ! gamma-delta T cell CL:0000801 ! gamma-delta intraepithelial T cell CL:0000802 ! CD8-positive, gamma-delta intraepithelial T cell CL:0000803 ! CD4-positive, gamma-delta intraepithelial T cell CL:0000804 ! immature T cell CL:0000813 ! memory T cell CL:0000814 ! NK T cell CL:0000815 ! regulatory T cell CL:0000816 ! immature B cell CL:0000817 ! pre-B cell CL:0000818 ! transitional stage B cell CL:0000819 ! B-1 B cell CL:0000820 ! B-1a B cell CL:0000821 ! B-1b B cell CL:0000825 ! natural killer cell progenitor CL:0000826 ! pro-B cell CL:0000827 ! pro-T cell CL:0000837 ! hematopoietic progenitor cell CL:0000838 ! lymphoid progenitor cell CL:0000839 ! myeloid progenitor cell CL:0000842 ! mononuclear cell CL:0000843 ! follicular B cell CL:0000844 ! germinal center B cell CL:0000845 ! marginal zone B cell CL:0000851 ! neuromast mantle cell CL:0000852 ! neuromast support cell CL:0000855 ! hair cell CL:0000856 ! neuromast hair cell CL:1000274 ! trophectodermal cell
Taxon-GO file format
This is the proposed file format for the taxon-go links:
GO term | GO:id | relationship | taxon name | taxon id |
---|---|---|---|---|
photosynthesis | GO:0015979 | never_in_taxon | Mammalia | Taxonomy ID: 40674 |
male germ-line cyst formation | GO:0048136 | never_in_taxon | Mammalia | Taxonomy ID: 40674 |
hemocyte differentiation | GO:0042386 | never_outside_taxon | Arthropoda | Taxonomy ID: 6656 |
multicellular organismal process | GO:0032501 | never_outside_taxon | Eukaryota | Taxonomy ID: 2759 |
nucleus | GO:0005634 | never_outside_taxon | Eukaryota | Taxonomy ID: 2759 |
gametophyte development | GO:0048229 | never_in_taxon | Dictyostelium | Taxonomy ID: 5782 |
viral reproduction | GO:0016032 | never_outside_taxon | Viruses | Taxonomy ID: 10239 |
compund eye development | GO:0048749 | never_in_taxon | Mammalia | Taxonomy ID: 40674 |
lactation | GO:0007595 | never_outside_taxon | Mammalia | Taxonomy ID: 40674 |
fat body development | GO:0007503 | never_in_taxon | Mammalia | Taxonomy ID: 40674 |
I am not yet sure how to save a file like this from OBO-Edit after having added links. I will have a go at that.
1st May 2008
I have arranged a meeting with Susan Tweedie and Rebecca Foulger to start labeling the cell type and GO terms by taxon.
6th May 2008
I have cleared away all the old taxon-go-related files from the scratch directory and made a new folder in there called 'taxon-go'. This folder contains a copy of the taxon slim.
I have still not worked out how to save the tab-delimited file of relationships out of OBO-Edit and this is the major obstacle to starting work just now.
Chris has pointed out that I don't need to be able to propagate the links down the graph and have those links actually instantiated as the it is easy to infer them. For working in OBO-Edit I just need to set a render that will show if a term has a taxon link already applied to one of its ancestors.
I have made a tab-delimited file to contain the links between the ontology file and the taxon slim and it is go/scratch/taxon-go/go-taxon-links.txt