DictyBase Dec 5-March 3
dictyBase March, 2014
PI: Rex Chisholm
Annotators: Petra Fey, Robert Dodson
Developers: Siddhartha Basu, Yogesh Pandit (until October 2013), David Jimenez-Morales (since February 2014)
All dictyBase staff contributes to GO activities. This is currently a total of 3 FTE positions. Of these we receive sufficient funding from the GO grant to support 0.6 FTEs.
Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, strains, phenotypes, and general nomenclature. Both curators work to annotate gene products of the Dictyostelium genome.
At dictyBase we use the Protein2GO tool from the EBI to annotate GO for Dictyostelium proteins since June 2012. Since February 2013, we re-import annotations monthly from GOA back into dictyBase, including electronic annotations. We then append our GO annotations to RNAs and send the GAF file to the GO consortium.
The use of Protein2GO continues to be of great advantage and enables us to react to changing annotation practices avoiding a direct impact on our database. For example, dictyBase curators regularly add annotation extensions in Protein2GO although they cannot currently be stored or displayed at dictyBase. We expect to be able to provide annotation extensions to our users later this year. Curators continue to interact regularly with Rachael Huntley and Tony Sawford to solve any annotation or technical issues.
Semi-automated annotation: We have worked for two years now towards using Textpresso to suggest GO terms for annotation to cellular component terms (Van Auken et al., BMC Bioinformatics 2009, 10:228). We had tested the original WormBase annotation tool and participated in the BioCreative Workshop Track III in 2012 for which we compared purely manually curated cell component annotations with Textpresso-assisted annotations. However, implementation at dictyBase was further delayed because of the GO consortium's wise decision to connect Textpresso annotations directly to Protein2GO. These latest developments save our developer's time, who otherwise would have needed to import and appended Textpresso annotations separately. Recently we tested the updated Textpresso annotation tool and provided valuable feedback to Kimberly van Auken. In addition, we must provide a GPI file containing all dictyBase protein names and identifiers to make the text mining a success. Since dictyBase curators name genes and gene products when papers get first published and in an on-going process, we plan to supply an updated GPI file to the Textpresso folks weekly. We expect to start using Textpresso regularly in our GO curation process in spring 2014.
Petra is coordinating all dictyBase GO annotation issues with regard to Protein2GO with the EBI including identifier issues resulting from the expansion of the tool to all GO consortium curators.
Petra, Robert and Siddhartha are working with Kimberly Van Auken on the semi-automatic CC annotations using Textpresso.
Siddhartha is working on implementing the GPAD and GPI files for the pipelines between the EBI, dictyBase and the GO consortium
Other dictyBase contributions to GO:
Both dictyBase curators work to improve the GO with GO editors and other curators in the field, and contribute to discussions on the GO email list, in the bi-weekly annotation calls, and on Source Forge. One new GO term has been requested in the last quarter to curate Dictyostelium papers.
dictyBase Is currently working on a complete database overhaul. Once that is completed 3 additional genomes (D. purpureum, D. fasciculatum and P. pallidum) will be annotated semi-automatically by transferring experimental GO annotations from D. discoideum ISO to 1:1 orthologs.
The total and non-IEA annotation numbers include annotations that are not from dictyBase, but from the PAINT project and few, from INTACT and SwissProt. At this point we cannot easily query them by source. However, the experimental annotations are nearly all from dictyBase and the indicated annotation extensions (column 16) have been exclusively annotated by dictyBase curators (Table 1).
Table 1: Number of Annotations
|Total number of annotations||57925||60149||+ 3.8%|
Table 2: Number of non-IEA Annotations
|Total number of annotations||20367||21137||+ 3.8%|
Table 3: Additional Numbers
|Total EXP annotations|
Methods and strategies for annotation
(please note % effort on literature curation vs. computational annotation methods)
Literature and other manual curation represent nearly 100% of the curation activities at dictyBase.
In addition to gene product, strain and phenotype annotation, dictyBase curators extract GO annotations from Dictyostelium publications. GO annotations are added using the Protein2GO tool provided by the EBI. Annotations are currently imported into dictyBase and supplied to the GO consortium monthly.
Use of Textpresso to annotate cellular components is imminent. The updated tool that transfers Textpresso annotations directly to Protein2GO has been made available recently. Once implemented, this should increase efficiency of GO curation by reducing time curators spend on literature.
IEAs are being imported from GOA and assigned to the respective gene products on a monthly schedule.
Quality control measures
dictyBase curators work closely to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://wiki.dictybase.org/dictywiki/index.php/Standard_Operating_Procedures) to which curators adhere. Curators discuss consistency issues as they arise and decisions are recorded in the Standard Operating Procedures.