Taxon-GO Implementation April 2008 onwards

From GO Wiki
Revision as of 09:15, 15 May 2008 by Jdeegan (talk | contribs)

Jump to: navigation, search

At the Consortium meetings in Princeton in 2007 and Salt Lake City in 2008 Jennifer presented a proposal and pilot on the system of implementing taxon information. At the Salt Lake City Meeting it was decided to implement the proposal. This page is for recording of progress on that implementation.

The original proposal is not currently archived.
The pilot data is at

29th April 2008

In starting to implement the links I am using the custom taxon slim that Chris Mungall made from the NCBI taxonomy hierarchy.

Taxonomy Slim

This is what he did to make the slim:

I grabbed all species with an annotation in the database, then did a  simple filter on the results: Example_Queries#Total_annotations.2C_grouped_by_species. 2C_broken_down_by_evidence

grep -v IEA z | cut -f1 | sort -u | perl -npe 's//NCBITaxon:/' > ~/ tmp/tax-ids.txt 

(there were almost a 1000!)

I then used my segmentation tool (part of obol) to slice these IDs  and their descendants from the ncbi tax file I publish on the   
obo  download page. The results are in:

there's a bug in my segmenter in that the ranks (genus, order,  family) were not included. But this may work to your advantage in    
that these are stored using generic term properties which people  aren't used to yet. It seems like you don't need these anyway.

I am to reproduce my segmenter functionality in OE. In fact it may be  possible to do this right now with filter scripts. In this   
particular  case the segmenter is doing something pretty basic - following all  input terms up to the root and writing as .obo

Q/ Should this slim now be checked into cvs in a non-scratch directory?

Cross Product files

Chris has made files in scratch that show cross products between the go ontology file and the various other ontologies. He has suggested that I should look at the cell type file and categorise the cell types by taxon and then transfer those to the GO file. This will cover far more terms with less work.

The cross product files are at /go/scratch/xps/

I have pulled out the list of cell types to be categorized and it is here:

CL:0000017 ! spermatocyte
CL:0000018 ! spermatid
CL:0000019 ! sperm
CL:0000023 ! oocyte
CL:0000025 ! egg
CL:0000026 ! nurse cell
CL:0000030 ! glioblast
CL:0000031 ! neuroblast
CL:0000034 ! stem cell
CL:0000037 ! hematopoietic stem cell
CL:0000056 ! myoblast
CL:0000057 ! fibroblast
CL:0000062 ! osteoblast
CL:0000066 ! epithelial cell
CL:0000071 ! blood vessel endothelial cell
CL:0000075 ! columnar/cuboidal epithelial cell
CL:0000081 ! blood cell
CL:0000084 ! T cell
CL:0000092 ! osteoclast
CL:0000094 ! granulocyte
CL:0000097 ! mast cell
CL:0000115 ! endothelial cell
CL:0000125 ! glial cell
CL:0000127 ! astrocyte
CL:0000128 ! oligodendrocyte
CL:0000129 ! microglial cell
CL:0000134 ! mesenchymal cell
CL:0000136 ! fat cell
CL:0000138 ! chondrocyte
CL:0000147 ! pigment cell
CL:0000148 ! melanocyte
CL:0000150 ! glandular epithelial cell
CL:0000178 ! Leydig cell
CL:0000187 ! muscle cell
CL:0000188 ! skeletal muscle cell
CL:0000192 ! smooth muscle cell
CL:0000201 ! auditory receptor cell
CL:0000202 ! auditory hair cell
CL:0000210 ! photoreceptor cell
CL:0000216 ! Sertoli cell
CL:0000218 ! Schwann cell
CL:0000221 ! ectodermal cell
CL:0000222 ! mesodermal cell
CL:0000223 ! endodermal cell
CL:0000228 ! multinucleate cell
CL:0000232 ! erythrocyte
CL:0000233 ! platelet
CL:0000235 ! macrophage
CL:0000236 ! B cell
CL:0000248 ! microsporocyte
CL:0000250 ! megaspore
CL:0000252 ! microspore
CL:0000253 ! eurydendroid cell
CL:0000254 ! egg cell
CL:0000262 ! guard mother cell
CL:0000276 ! sclerenchyma cell
CL:0000280 ! generative cell
CL:0000282 ! trichome
CL:0000284 ! companion cell
CL:0000287 ! eye photoreceptor cell
CL:0000288 ! synergid
CL:0000292 ! guard cell
CL:0000294 ! sieve cell
CL:0000295 ! somatotropin secreting cell
CL:0000296 ! vegetative cell
CL:0000299 ! trichoblast
CL:0000300 ! gamete
CL:0000301 ! pole cell
CL:0000312 ! keratinocyte
CL:0000332 ! atrichoblast
CL:0000333 ! neural crest cell
CL:0000362 ! epidermal cell
CL:0000365 ! zygote
CL:0000373 ! histoblast
CL:0000392 ! crystal cell
CL:0000394 ! plasmatocyte
CL:0000396 ! lamellocyte
CL:0000408 ! male gamete
CL:0000430 ! xanthophore
CL:0000431 ! iridophore
CL:0000439 ! prolactin secreting cell
CL:0000442 ! follicular dendritic cell
CL:0000448 ! white fat cell
CL:0000449 ! brown fat cell
CL:0000451 ! dendritic cell
CL:0000453 ! Langerhans cell
CL:0000467 ! adrenocorticotropic hormone secreting cell
CL:0000469 ! ganglion mother cell
CL:0000474 ! pericardial cell
CL:0000476 ! thyroid stimulating hormone secreting cell
CL:0000477 ! follicle cell
CL:0000486 ! garland cell
CL:0000487 ! oenocyte
CL:0000492 ! T-helper cell
CL:0000501 ! granulosa cell
CL:0000522 ! spore
CL:0000537 ! antipodal cell
CL:0000540 ! neuron
CL:0000542 ! lymphocyte
CL:0000545 ! T-helper 1 cell
CL:0000546 ! T-helper 2 cell
CL:0000556 ! megakaryocyte
CL:0000562 ! nucleate erythrocyte
CL:0000563 ! endospore
CL:0000571 ! leucophore
CL:0000573 ! retinal cone cell
CL:0000574 ! erythrophore
CL:0000576 ! monocyte
CL:0000579 ! border follicle cell
CL:0000586 ! germ cell
CL:0000595 ! enucleate erythrocyte
CL:0000598 ! pyramidal cell
CL:0000599 ! conidium
CL:0000604 ! retinal rod cell
CL:0000607 ! ascospore
CL:0000608 ! zygospore
CL:0000609 ! vestibular hair cell
CL:0000615 ! basidiospore
CL:0000616 ! sporangiospore
CL:0000623 ! natural killer cell
CL:0000624 ! CD4-positive, alpha-beta T cell
CL:0000625 ! CD8-positive, alpha-beta T cell
CL:0000644 ! Bergmann glial cell
CL:0000656 ! primary spermatocyte
CL:0000668 ! parenchymal cell
CL:0000674 ! interfollicle cell
CL:0000675 ! female gamete
CL:0000681 ! radial glial cell
CL:0000695 ! Cajal-Retzius cell
CL:0000711 ! cumulus cell
CL:0000716 ! lymph gland crystal cell
CL:0000722 ! cystoblast
CL:0000723 ! somatic stem cell
CL:0000724 ! heterocyst
CL:0000726 ! chlamydospore
CL:0000730 ! leading edge cell
CL:0000731 ! urothelial cell
CL:0000732 ! amoeboid cell
CL:0000733 ! lymph gland plasmatocyte
CL:0000735 ! lymph gland hemocyte
CL:0000737 ! striated muscle cell
CL:0000738 ! leukocyte
CL:0000740 ! retinal ganglion cell
CL:0000746 ! cardiac muscle cell
CL:0000747 ! cyanophore
CL:0000748 ! retinal bipolar neuron
CL:0000762 ! thrombocyte
CL:0000763 ! myeloid cell
CL:0000766 ! myeloid leukocyte
CL:0000767 ! basophil
CL:0000771 ! eosinophil
CL:0000775 ! neutrophil
CL:0000782 ! myeloid dendritic cell
CL:0000784 ! plasmacytoid dendritic cell
CL:0000785 ! mature B cell
CL:0000786 ! plasma cell
CL:0000787 ! memory B cell
CL:0000789 ! alpha-beta T cell
CL:0000792 ! CD4-positive, CD25-positive, alpha-beta regulatory T cell
CL:0000793 ! CD4-positive, alpha-beta intraepithelial T cell
CL:0000794 ! CD8-positive, alpha-beta cytotoxic T cell
CL:0000795 ! CD8-positive, alpha-beta regulatory T cell
CL:0000796 ! CD8 positive, alpha-beta intraepithelial T cell
CL:0000797 ! alpha-beta intraepithelial T cell
CL:0000798 ! gamma-delta T cell
CL:0000801 ! gamma-delta intraepithelial T cell
CL:0000802 ! CD8-positive, gamma-delta intraepithelial T cell
CL:0000803 ! CD4-positive, gamma-delta intraepithelial T cell
CL:0000804 ! immature T cell
CL:0000813 ! memory T cell
CL:0000814 ! NK T cell
CL:0000815 ! regulatory T cell
CL:0000816 ! immature B cell
CL:0000817 ! pre-B cell
CL:0000818 ! transitional stage B cell
CL:0000819 ! B-1 B cell
CL:0000820 ! B-1a B cell
CL:0000821 ! B-1b B cell
CL:0000825 ! natural killer cell progenitor
CL:0000826 ! pro-B cell
CL:0000827 ! pro-T cell
CL:0000837 ! hematopoietic progenitor cell
CL:0000838 ! lymphoid progenitor cell
CL:0000839 ! myeloid progenitor cell
CL:0000842 ! mononuclear cell
CL:0000843 ! follicular B cell
CL:0000844 ! germinal center B cell
CL:0000845 ! marginal zone B cell
CL:0000851 ! neuromast mantle cell
CL:0000852 ! neuromast support cell
CL:0000855 ! hair cell
CL:0000856 ! neuromast hair cell
CL:1000274 ! trophectodermal cell

Taxon-GO file format

This is the proposed file format for the taxon-go links:

GO term GO:id relationship taxon name taxon id
photosynthesis GO:0015979 never_in_taxon Mammalia Taxonomy ID: 40674
male germ-line cyst formation GO:0048136 never_in_taxon Mammalia Taxonomy ID: 40674
hemocyte differentiation GO:0042386 never_outside_taxon Arthropoda Taxonomy ID: 6656
multicellular organismal process GO:0032501 never_outside_taxon Eukaryota Taxonomy ID: 2759
nucleus GO:0005634 never_outside_taxon Eukaryota Taxonomy ID: 2759
gametophyte development GO:0048229 never_in_taxon Dictyostelium Taxonomy ID: 5782
viral reproduction GO:0016032 never_outside_taxon Viruses Taxonomy ID: 10239
compund eye development GO:0048749 never_in_taxon Mammalia Taxonomy ID: 40674
lactation GO:0007595 never_outside_taxon Mammalia Taxonomy ID: 40674
fat body development GO:0007503 never_in_taxon Mammalia Taxonomy ID: 40674

I am not yet sure how to save a file like this from OBO-Edit after having added links. I will have a go at that.

1st May 2008

I have arranged a meeting with Susan Tweedie and Rebecca Foulger to start labeling the cell type and GO terms by taxon.

6th May 2008

I have cleared away all the old taxon-go-related files from the scratch directory and made a new folder in there called 'taxon-go'. This folder contains a copy of the taxon slim.

I have still not worked out how to save the tab-delimited file of relationships out of OBO-Edit and this is the major obstacle to starting work just now.

Chris has pointed out that I don't need to be able to propagate the links down the graph and have those links actually instantiated as the it is easy to infer them. For working in OBO-Edit I just need to set a render that will show if a term has a taxon link already applied to one of its ancestors.

I have made a tab-delimited file to contain the links between the ontology file and the taxon slim and it is go/scratch/taxon-go/go-taxon-links.txt

6th May PM

Further progress as described in a mail to Chris:

I made the file of go-taxon relationships in obo and tab-delimited format to test both.

When I load with the tab-delimited version in OBO-Edit the relationships between the go 
terms and the taxon terms don't show up at all and I'm not sure what to 
do to persuade them. I suppose this format is just not one that OBO-Edit is prepared for.

When I load with the obo format go-taxon file the taxon links show in the graph viewer,
 and the normal links show in the graphviz component. However, when I 
click on a go term with a taxon link, the graphviz component goes on strike and does not
 update at all. No idea of why it is being picky about that.
The other weird thing is that the application treats the go-taxon file as a separate 
ontology and it does not seem to realise that this is a relationship between  
the two other loaded ontologies. I will attach a picture so you can see.

With either format I'm not sure that OBO-Edit knows how to save the three files out separately.

10th May

The obo version of the relationship file is now working. There was a formatting problem with the taxon ids.

15th May

I have now written to Chris to ask how I should represent links where the go term should not be used outside of the combination of two taxa. For example photosynthesis, which should not be used outside of the combination of bacterial and viridiplantae taxa.

I am not currently able to save out the taxon links. The major barrier is that the save ontology panel in oboedit is too big to open out fully in my laptop. I have submitted a bug report but the code looks quite complicated in that part. The setup to let different panels appear or disappear when boxes are checked is quite hard to fix.

The graphviz plugin would also be much easier to use for this if the disjoint relationships were not shown, and I have gone some way to figuring out how to do that. The text file listing the relationships in the graph is set up with a small piece of code in the graphviz component, but I do not yet understand the relationship management methods enough to be able to configure which relationships are included.