Taxon-GO Implementation April 2008 onwards

From GO Wiki
Revision as of 11:02, 29 April 2008 by Jdeegan (talk | contribs)
Jump to navigation Jump to search

At the Consortium meetings in Princeton in 2007 and Salt Lake City in 2008 Jennifer presented a proposal and pilot on the system of implementing taxon information. At the Salt Lake City Meeting it was decided to implement the proposal. This page is for recording of progress on that implementation.


The original proposal is not currently archived.
The pilot data is at
/go/meeting/minutes/20080420_Additional_Material/GO-Taxon_Links_Report.ppt


29th April 2008

In starting to implement the links I am using the custom taxon slim that Chris Mungall made from the NCBI taxonomy hierarchy.

This is what he did to make the slim:

I grabbed all species with an annotation in the database, then did a  simple filter on the results:
http://wiki.geneontology.org/index.php/ Example_Queries#Total_annotations.2C_grouped_by_species. 2C_broken_down_by_evidence

grep -v IEA z | cut -f1 | sort -u | perl -npe 's//NCBITaxon:/' > ~/ tmp/tax-ids.txt 

(there were almost a 1000!)

I then used my segmentation tool (part of obol) to slice these IDs  and their descendants from the ncbi tax file I publish on the   
obo  download page. The results are in:

http://www.berkeleybop.org/obol/tmp/ncbitax-slim.obo

there's a bug in my segmenter in that the ranks (genus, order,  family) were not included. But this may work to your advantage in    
that these are stored using generic term properties which people  aren't used to yet. It seems like you don't need these anyway.

I am to reproduce my segmenter functionality in OE. In fact it may be  possible to do this right now with filter scripts. In this   
particular  case the segmenter is doing something pretty basic - following all  input terms up to the root and writing as .obo


Chris has made files in scratch that show cross products between the go ontology file and the various other ontologies. He has suggested that I should look at the cell type file and categorise the cell types by taxon and then transfer those to the GO file. This will cover far more terms with less work.

The cross product files are at /go/scratch/xps/

I have pulled out the list of cell types to be categorized and it is here:

CL:0000017 ! spermatocyte
CL:0000018 ! spermatid
CL:0000019 ! sperm
CL:0000023 ! oocyte
CL:0000025 ! egg
CL:0000026 ! nurse cell
CL:0000030 ! glioblast
CL:0000031 ! neuroblast
CL:0000034 ! stem cell
CL:0000037 ! hematopoietic stem cell
CL:0000056 ! myoblast
CL:0000057 ! fibroblast
CL:0000062 ! osteoblast
CL:0000066 ! epithelial cell
CL:0000071 ! blood vessel endothelial cell
CL:0000075 ! columnar/cuboidal epithelial cell
CL:0000081 ! blood cell
CL:0000084 ! T cell
CL:0000092 ! osteoclast
CL:0000094 ! granulocyte
CL:0000097 ! mast cell
CL:0000115 ! endothelial cell
CL:0000125 ! glial cell
CL:0000127 ! astrocyte
CL:0000128 ! oligodendrocyte
CL:0000129 ! microglial cell
CL:0000134 ! mesenchymal cell
CL:0000136 ! fat cell
CL:0000138 ! chondrocyte
CL:0000147 ! pigment cell
CL:0000148 ! melanocyte
CL:0000150 ! glandular epithelial cell
CL:0000178 ! Leydig cell
CL:0000187 ! muscle cell
CL:0000188 ! skeletal muscle cell
CL:0000192 ! smooth muscle cell
CL:0000201 ! auditory receptor cell
CL:0000202 ! auditory hair cell
CL:0000210 ! photoreceptor cell
CL:0000216 ! Sertoli cell
CL:0000218 ! Schwann cell
CL:0000221 ! ectodermal cell
CL:0000222 ! mesodermal cell
CL:0000223 ! endodermal cell
CL:0000228 ! multinucleate cell
CL:0000232 ! erythrocyte
CL:0000233 ! platelet
CL:0000235 ! macrophage
CL:0000236 ! B cell
CL:0000248 ! microsporocyte
CL:0000250 ! megaspore
CL:0000252 ! microspore
CL:0000253 ! eurydendroid cell
CL:0000254 ! egg cell
CL:0000262 ! guard mother cell
CL:0000276 ! sclerenchyma cell
CL:0000280 ! generative cell
CL:0000282 ! trichome
CL:0000284 ! companion cell
CL:0000287 ! eye photoreceptor cell
CL:0000288 ! synergid
CL:0000292 ! guard cell
CL:0000294 ! sieve cell
CL:0000295 ! somatotropin secreting cell
CL:0000296 ! vegetative cell
CL:0000299 ! trichoblast
CL:0000300 ! gamete
CL:0000301 ! pole cell
CL:0000312 ! keratinocyte
CL:0000332 ! atrichoblast
CL:0000333 ! neural crest cell
CL:0000362 ! epidermal cell
CL:0000365 ! zygote
CL:0000373 ! histoblast
CL:0000392 ! crystal cell
CL:0000394 ! plasmatocyte
CL:0000396 ! lamellocyte
CL:0000408 ! male gamete
CL:0000430 ! xanthophore
CL:0000431 ! iridophore
CL:0000439 ! prolactin secreting cell
CL:0000442 ! follicular dendritic cell
CL:0000448 ! white fat cell
CL:0000449 ! brown fat cell
CL:0000451 ! dendritic cell
CL:0000453 ! Langerhans cell
CL:0000467 ! adrenocorticotropic hormone secreting cell
CL:0000469 ! ganglion mother cell
CL:0000474 ! pericardial cell
CL:0000476 ! thyroid stimulating hormone secreting cell
CL:0000477 ! follicle cell
CL:0000486 ! garland cell
CL:0000487 ! oenocyte
CL:0000492 ! T-helper cell
CL:0000501 ! granulosa cell
CL:0000522 ! spore
CL:0000537 ! antipodal cell
CL:0000540 ! neuron
CL:0000542 ! lymphocyte
CL:0000545 ! T-helper 1 cell
CL:0000546 ! T-helper 2 cell
CL:0000556 ! megakaryocyte
CL:0000562 ! nucleate erythrocyte
CL:0000563 ! endospore
CL:0000571 ! leucophore
CL:0000573 ! retinal cone cell
CL:0000574 ! erythrophore
CL:0000576 ! monocyte
CL:0000579 ! border follicle cell
CL:0000586 ! germ cell
CL:0000595 ! enucleate erythrocyte
CL:0000598 ! pyramidal cell
CL:0000599 ! conidium
CL:0000604 ! retinal rod cell
CL:0000607 ! ascospore
CL:0000608 ! zygospore
CL:0000609 ! vestibular hair cell
CL:0000615 ! basidiospore
CL:0000616 ! sporangiospore
CL:0000623 ! natural killer cell
CL:0000624 ! CD4-positive, alpha-beta T cell
CL:0000625 ! CD8-positive, alpha-beta T cell
CL:0000644 ! Bergmann glial cell
CL:0000656 ! primary spermatocyte
CL:0000668 ! parenchymal cell
CL:0000674 ! interfollicle cell
CL:0000675 ! female gamete
CL:0000681 ! radial glial cell
CL:0000695 ! Cajal-Retzius cell
CL:0000711 ! cumulus cell
CL:0000716 ! lymph gland crystal cell
CL:0000722 ! cystoblast
CL:0000723 ! somatic stem cell
CL:0000724 ! heterocyst
CL:0000726 ! chlamydospore
CL:0000730 ! leading edge cell
CL:0000731 ! urothelial cell
CL:0000732 ! amoeboid cell
CL:0000733 ! lymph gland plasmatocyte
CL:0000735 ! lymph gland hemocyte
CL:0000737 ! striated muscle cell
CL:0000738 ! leukocyte
CL:0000740 ! retinal ganglion cell
CL:0000746 ! cardiac muscle cell
CL:0000747 ! cyanophore
CL:0000748 ! retinal bipolar neuron
CL:0000762 ! thrombocyte
CL:0000763 ! myeloid cell
CL:0000766 ! myeloid leukocyte
CL:0000767 ! basophil
CL:0000771 ! eosinophil
CL:0000775 ! neutrophil
CL:0000782 ! myeloid dendritic cell
CL:0000784 ! plasmacytoid dendritic cell
CL:0000785 ! mature B cell
CL:0000786 ! plasma cell
CL:0000787 ! memory B cell
CL:0000789 ! alpha-beta T cell
CL:0000792 ! CD4-positive, CD25-positive, alpha-beta regulatory T cell
CL:0000793 ! CD4-positive, alpha-beta intraepithelial T cell
CL:0000794 ! CD8-positive, alpha-beta cytotoxic T cell
CL:0000795 ! CD8-positive, alpha-beta regulatory T cell
CL:0000796 ! CD8 positive, alpha-beta intraepithelial T cell
CL:0000797 ! alpha-beta intraepithelial T cell
CL:0000798 ! gamma-delta T cell
CL:0000801 ! gamma-delta intraepithelial T cell
CL:0000802 ! CD8-positive, gamma-delta intraepithelial T cell
CL:0000803 ! CD4-positive, gamma-delta intraepithelial T cell
CL:0000804 ! immature T cell
CL:0000813 ! memory T cell
CL:0000814 ! NK T cell
CL:0000815 ! regulatory T cell
CL:0000816 ! immature B cell
CL:0000817 ! pre-B cell
CL:0000818 ! transitional stage B cell
CL:0000819 ! B-1 B cell
CL:0000820 ! B-1a B cell
CL:0000821 ! B-1b B cell
CL:0000825 ! natural killer cell progenitor
CL:0000826 ! pro-B cell
CL:0000827 ! pro-T cell
CL:0000837 ! hematopoietic progenitor cell
CL:0000838 ! lymphoid progenitor cell
CL:0000839 ! myeloid progenitor cell
CL:0000842 ! mononuclear cell
CL:0000843 ! follicular B cell
CL:0000844 ! germinal center B cell
CL:0000845 ! marginal zone B cell
CL:0000851 ! neuromast mantle cell
CL:0000852 ! neuromast support cell
CL:0000855 ! hair cell
CL:0000856 ! neuromast hair cell
CL:1000274 ! trophectodermal cell


This is the proposed file format for the taxon-go links:

GO term GO:id relationship taxon name taxon id
photosynthesis GO:0015979 never_in_taxon Mammalia Taxonomy ID: 40674
male germ-line cyst formation GO:0048136 never_in_taxon Mammalia Taxonomy ID: 40674
hemocyte differentiation GO:0042386 never_outside_taxon Arthropoda Taxonomy ID: 6656
multicellular organismal process GO:0032501 never_outside_taxon Eukaryota Taxonomy ID: 2759
nucleus GO:0005634 never_outside_taxon Eukaryota Taxonomy ID: 2759
gametophyte development GO:0048229 never_in_taxon Dictyostelium Taxonomy ID: 5782
viral reproduction GO:0016032 never_outside_taxon Viruses Taxonomy ID: 10239
compund eye development GO:0048749 never_in_taxon Mammalia Taxonomy ID: 40674
lactation GO:0007595 never_outside_taxon Mammalia Taxonomy ID: 40674
fat body development GO:0007503 never_in_taxon Mammalia Taxonomy ID: 40674