Running P-POD orthology tool on the reference genomes gene set (Retired)
The current plan is to start with the gp2protein files, which we at Princeton will use to retrieve the actual protein sequences and generate fasta files. We will use the following files from the GO site:
* Arabidopsis thaliana: gp2protein.tair.gz * Caenorhabditis elegans: gp2protein.wb.gz * Danio rerio: gp2protein.zfin.gz * Dictyostelium discoideum: gp2protein.dictyBase.gz * Drosophila melanogaster: gp2protein.fb.gz * Homo sapiens: gp2protein.human.gz * Mus musculus: gp2protein.mgi.gz * Saccharomyces cerevisiae: gp2protein.sgd.gz * Schizosaccharomyces pombe: gp2protein.genedb_spombe.gz * Rattus norvegicus: gp2protein file from RGD (pending) * Escherichia coli: Uniprot file * Gallus gallus: Uniprot file