Reference Genome progress report for 2013
Aim 3. We will perform phylogenetically-based propagation of annotations. [This effort cut by 1 FTE as a result of final funding level].
During the past year, work on this aim has progressed more slowly than planned in the original grant. This is due to two reasons: 1) a decrease in the resources devoted to the aim, and 2) an increase in the scope of this aim that was not previously anticipated. The decrease in resources was due to budget cuts, as well as some reallocation of GO-funded curators to other aims that were deemed more critical for the current year. The scope of the aim was originally primarily to generate annotations for a large number of genes in non-model organisms, as well as to increase annotation coverage for model organism genes.
- Scope increased to include review and improvement of experimental GO annotations. Phylogenetic annotation is complementary to the paper-by-paper approach to GO curation, in that it is an integrated approach that allows curators to have much more biological context in which to interpret experimental results. As this work has progressed, it has become clear that the phylogenetic annotation propagation serves as an additional step of review and quality assurance for the experimentally-supported GO annotations. As the experimental annotations are considered to be a “gold standard,” the phylogenetic annotation project has become an important way to review, assess, and improve them when necessary. This is a completely novel process in the workflow of the GO consortium, that is highly valuable as it increases annotation consistency as well as coverage of the most important biological roles of genes annotated. This extra level of review, however, requires that phylogenetic curators spend time returning to original literature articles that were used to support questionable annotations. Thus, the project has progressed more slowly in terms of the number of families curated, though not only have annotations been carefully propagated to other genomes, the experimental annotations themselves have been improved.
- Increased automation of phylogenetic propagation. The decreased rate of accumulation of propagated annotations will at least partly be compensated by further automations of the curation process that we have started to develop. We have made developments to the PAINT software that accelerates some time-consuming and repetitive tasks such as documenting each tree annotation. We are now working on pre-calculating a number of protein features in the context of the phylogenetic tree, which curators routinely use to help guide their tree curation but which are currently time-consuming for them to infer, such as: evolution of active site residues and signal peptides, gain and loss of protein domains, and evolution of protein binding partners. We expect that, in the coming year, this additional automation will accelerate the phylogenetic propagation of annotations.
- Software and data infrastructure improvements. We have made good progress on infrastructure to support more accurate and maintainable phylogenetic inference. The phylogenetic tree building algorithm has been modified to infer horizontal transfer events, leading to more accurate trees for annotation. We remain active participants and contributors to the Quest for Orthologs collaboration. We supplied updated PANTHER trees, which were evaluated in comparison to approaches (favorably); we suggested additional species to be included in the reference proteomes; we ensured that the mappings between MOD identifiers and UniProtKB identifiers were kept up to date (e.g. TAIR). We acted as a representative of the QfO collaboration at the annual TDWG meeting and initiated a dialog with the systematics community. We have made significant enhancements to the PAINT desktop software for phylogenetic annotation (http://www.ncbi.nlm.nih.gov/pubmed/21873635), which further increase the efficiency of the phylogenetic curation process. We are currently on release PAINT1.0_beta69 of the software and there have been 14 releases since the beginning of this year’s grant cycle. Literally hundreds of code changes have been made in response to new feature requests and bug reports from curators. The software provides an integrated view of the collective GO annotations and enables curators to select the most representative characterization of the gene family to annotate the ancestral nodes, and thereby propagate these gene functions to entire branches of the tree (or not) using a simple click and drag mechanism. Because the trees now have permanent identifiers these annotations will persist over subsequent releases of the reference proteomes and builds of the PANTHER trees. We have implemented the first prototype of a Web based version of PAINT, to make phylogenetic annotation software more easily accessible and integrated into the GO Common Annotation Framework (Aim 1). Functionally the web-based prototype replicates the phylogenetic tree and matrix components of the Java desktop version.