Princeton Progress Report December 2009

From GO Wiki
Jump to: navigation, search

A. Specific Aims for Princeton supplement

The aims (listed below) for the supplemental project undertaken at Princeton have not been modified:

1. Generate new protein family clusters as required for the Reference Genome effort. 2. Integrate data from multiple homolog/ortholog detection methods to enable efficient leverage of existing homology resources. 3. Provide the coordination and expert review necessary to enable reliable transfer of GO annotations to newly sequenced genomes.

B. Studies and Results for Princeton supplement

1. Generate new protein family clusters as required for the Reference Genome effort.

In 2008, we generated the initial set of protein family clusters for the Reference Genome project. We continue to provide these families and update them when new protein sets are released. Our latest release was December, 2009.

2. Integrate data from multiple homolog/ortholog detection methods to enable efficient leverage of existing homology resources.

In 2009, we added to P-POD clusters of orthologous groups of proteins from the 12 Reference Genomes based on the InParanoid ortholog prediction algorithm. We have also implemented an algorithm to generate consensus clusters based on combining results from OrthoMCL and InParanoid and plan to release these results after additional testing in 2010.

Currently, we are running our P-POD analyses on the set of 48 genomes used in the PANTHER protein families to better integrate the orthologous prediction results from OrthoMCL and InParanoid with the broader PANTHER families. This work required some modifications to the backend of P-POD to handle the increased data and memory load. In addition to the direct benefit to the Reference Genome project, this work has two positive side effects to the community: 1) we have worked with Chris Stoeckert’s group at Penn to make improvements to the OrthoMCL code, which is used by many other groups in the research community, and 2) the P-POD pipeline will be able to be more easily leveraged when used as the ortholog prediction resource for modENCODE and other projects.

3. Provide the coordination and expert review necessary to enable reliable transfer of GO annotations to newly sequenced genomes.

We have continued to work closely with Pascale Gaudet, Suzanna Lewis, and Paul Thomas in testing and determining specifications for the PANTHER annotation tool, PAINT. While testing, we have also been using the beta version of the PAINT tool to annotate gene products to GO based on phylogenetic relationships. Our annotation work thus far has yielded hundreds of new annotations for the Reference Genome project. In addition, we developed a protocol with the model organism database groups so that they can easily incorporate the annotations that we are generating. In January, our protocol will be in production, and the annotations will be available through the model organism databases and the Gene Ontology web site on a continually updated basis.