Princeton Progress Report

From GO Wiki
Revision as of 11:41, 17 December 2010 by Hdrabkin (talk | contribs)

Jump to: navigation, search

A. Specific Aims for Princeton supplement: The aims (listed below) for the supplemental project undertaken at Princeton have not been modified.

B. Studies and Results for Princeton supplement

1. Generate new protein family clusters as required for the Reference Genome effort: In 2008, we generated the initial set of protein family clusters for the Reference Genome project. We continue to provide these families and update them when new protein sets are released, in synchronization with PANTHER.

2. Integrate data from multiple homolog/ortholog detection methods to enable efficient leverage of existing homology resources: We released our implementation of an algorithm to generate consensus clusters based on combining results from OrthoMCL and InParanoid ("Naive Ensemble" sets available via the P-POD web site).

3. Provide the coordination and expert review necessary to enable reliable transfer of GO annotations to newly sequenced genomes: In 2010, we continued to coordinate efforts to accurately transfer annotations based on phylogenetic trees. We made progress in software tool development, annotation quality control, and began in earnest the transfer of experimentally based annotations to uncharacterized genes based on evolutionary relationships. Working with the GO software team, we built a pipeline for these annotations to be seamlessly incorporated back into the model organism databases for distribution to the scientific community. These annotations will now be available through the model organism databases and the Gene Ontology web site on a continually updated basis.

Software tool development and testing: 1) PAINT: We have been involved with the development of the PANTHER annotation tool, PAINT, by beta testing, suggesting fixes and enhancements for 16 rounds of testing (beta 16 through 32, to date) while working closely with the software developers led by Suzanna Lewis and Paul Thomas. 2) GO utilities: Sven Heinicke has contributed several utility scripts to the GO code base, include code to load the phylogenetic tree data into the GO database and code to match/disambiguate gene and protein IDs. 3) Annotation Tracker: Sven Heinicke has created the Annotation Tracker tool that displays the phylogenetic tree information along with meta-data about their annotations so that we can easily monitor our curation progress and efficiently prioritize our curation efforts.

Quality control: Mike Livstone and Kara Dolinski, along with the rest of the PAINT team, developed an SOP for PAINT-based curation. We developed standard, regular procedures to communicate with other GO curators on annotation issues and questions. We also developed an initial user manual for PAINT and are training additional curators in how to use PAINT, as we scale up our efforts in these phylogenetic-based annotations in the next year.

Phylogenetic-based annotation: the following annotation has been done at Princeton:


Protein Family Description unique ancestors annotated terms annotated to ancestors unique proteins annotated new protein annotations
PHOSPHOSERINE PHOSPHATASE
3
6
44
124
CU/ZN SUPEROXIDE DISMUTASE
6
11
69
326
LON PROTEASE
5
20
92
879
DNA MISMATCH REPAIR PROTEIN (MLH, PMS, MUTL)
7
15
76
447
DNA REPAIR ENDONUCLEASE XP-F / MEI-9 / RAD1
2
8
25
164
PRESENILIN
3
26
26
480
AXIN
4
36
270
1,101
FRIZZLED
33
167
180
3,456
DNA MISMATCH REPAIR MUTS RELATED PROTEINS
14
56
110
1,246
CELLULAR TUMOR ANTIGEN P53-RELATED
4
28
29
578
FORKHEAD PROTEIN/ FORKHEAD PROTEIN DOMAIN
84
350
524
10,865
WNT
49
229
226
3,735
BETA-CATENIN INTERACTING PROTEIN (CTNNBIP1)
1
7
12
90
UNCHARACTERIZED
1
1
16
16
PHOSPHOHEXOMUTASE FAMILY MEMBER
13
46
196
1,708
BETA CATENIN
6
87
28
1,261
ABC TRANSPORTERS
19
42
251
1,379
Totals:
254
1,135
2,174
27,855