Difference between revisions of "Reference proteomes files"

From GO Wiki
Jump to: navigation, search
(Created page with '==Background== * Following the Quest for Orthologs meeting in Hinxton in July 2009, a representative group from the orthology algorithm community as well as consumers of ortholog…')
 
(Currently available pre-alpha files: not ready for release)
Line 5: Line 5:
 
* [http://www.ebi.ac.uk/~dbarrell/qfo/ documentation and progress]
 
* [http://www.ebi.ac.uk/~dbarrell/qfo/ documentation and progress]
 
* [ftp://ftp.ebi.ac.uk/pub/contrib/qfo/ ftp directory of files]
 
* [ftp://ftp.ebi.ac.uk/pub/contrib/qfo/ ftp directory of files]
 +
* NOTE that for now, these files should contain one entry per gene.  We will discuss separately whether to follow up with another file that includes alternative splice forms, etc.
 +
 
==Issues and bugs==
 
==Issues and bugs==
 
* This wiki page should be used to enter issues with the current files that need to be addressed before release, together with an email contact for getting more information about the issue
 
* This wiki page should be used to enter issues with the current files that need to be addressed before release, together with an email contact for getting more information about the issue
 
** For MODs, there seems to be a duplication/triplication of database source in the gene ID field, e.g.
 
** For MODs, there seems to be a duplication/triplication of database source in the gene ID field, e.g.
 
*** in the [ftp://ftp.ebi.ac.uk/pub/contrib/qfo/10090_mus_musculus.fasta mouse FASTA], the first gene ID field is MGI:MGI:MGI:1918932 (paul.thomas@sri.com)
 
*** in the [ftp://ftp.ebi.ac.uk/pub/contrib/qfo/10090_mus_musculus.fasta mouse FASTA], the first gene ID field is MGI:MGI:MGI:1918932 (paul.thomas@sri.com)

Revision as of 12:22, 22 January 2010

Background

  • Following the Quest for Orthologs meeting in Hinxton in July 2009, a representative group from the orthology algorithm community as well as consumers of ortholog prediction data, particularly from the GO, agreed to decide upon a set of phylogenetically representative genomes. For each of these genomes, a standard, "reference" set of all protein coding genes would be compiled for each organism; and a "canonical" protein sequence would be selected for each of these genes. Rolf Apweiler at UniProt offered that his group would create and maintain these files, which is kindly being done by Dan Barrell.
  • For model organisms in the Reference Genome Project, these gene sets will be determined with feedback from each MOD.

Currently available pre-alpha files: not ready for release

Issues and bugs

  • This wiki page should be used to enter issues with the current files that need to be addressed before release, together with an email contact for getting more information about the issue
    • For MODs, there seems to be a duplication/triplication of database source in the gene ID field, e.g.
      • in the mouse FASTA, the first gene ID field is MGI:MGI:MGI:1918932 (paul.thomas@sri.com)