DictyBase Progress Report December 2013

From GO Wiki
Jump to: navigation, search

dictyBaseDecember , 2013

Staff:

PI: Rex Chisholm

Annotators: Petra Fey, Robert Dodson; Pascale Gaudet (consultant, until June 2013)

Developers: Siddhartha Basu, Yogesh Pandit (until October 2013)


All dictyBase staff contributes to GO activities. This is currently a total of 3 FTE positions. Of these we receive sufficient funding from the GO grant to support 0.6 FTEs.

Annotation:

Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, strains, phenotypes, and general nomenclature. Both curators work to annotate gene products of the Dictyostelium genome.


Back in 2011 we moved to store GO annotations in Chado and implemented the ability to import OBO and GAF2.0 format files. This made our own GO annotation tool obsolete, and we started to use the Protein2GO tool from the EBI to annotate GO for Dictyostelium proteins. In February 2013, we completed the pipeline for the regular re-import of annotations from GOA back into dictyBase, including electronic annotations. We then append our GO annotations to RNAs and send the GAF file to the GO consortium.

The use of Protein2GO continues to be of great advantage and enables us to react to changing annotation practices avoiding a direct impact on our database. For example, dictyBase curators regularly add annotation extensions in Protein2GO although they cannot currently be stored or displayed at dictyBase. However, we expect to be able to provide annotation extensions to our users in the coming year. Curators also interact regularly with Rachael Huntley and Tony Sawford to solve annotation or technical issues. Recently several high-level GO terms have been made unavailable for manual annotations and thus appear as GAF errors in the Jenkins report. While this can usually be taken care of by our curators, we had a large amount of manual annotations to the high-level term ‘cytokinesis’. Since these annotations could be upgraded to a single term, ‘mitotic cytokinesis’, Petra asked Tony to batch-update all in Protein2GO, which he kindly did and thus saved us from doing time-consuming and repetitive work manually.

Semi-automated annotation: We have worked for nearly two years now towards using Textpresso to suggest GO terms for annotation to cellular component terms (Van Auken et al., BMC Bioinformatics 2009, 10:228). We had tested the original WormBase annotation tool and participated in the BioCreative Workshop Track III in April 2012 for which we compared purely manually curated cell component annotations with Textpresso-assisted annotations. However, implementation at dictyBase was further delayed because of the GO consortium’s wise decision to connect Textpresso annotations directly to Protein2GO. These latest developments save our developer’s time, who otherwise would have needed to import and append Textpresso annotations separately. Recently we tested the updated Textpresso annotation tool and provided valuable feedback to Kimberly van Auken. In addition, we must provide a GPI file containing all dictyBase protein names and identifiers to make the text mining a success. Since dictyBase curators name genes and gene products when papers get first published and in an on-going process, we plan to supply an updated GPI file to the Textpresso folks weekly. We expect to start using Textpresso regularly in our GO curation process in early 2014.

Petra is coordinating all dictyBase GO annotation issues with regard to Protein2GO with the EBI including identifier issues resulting from the expansion of the tool to all GO consortium curators.

Petra, Robert and Siddhartha are working with Kimberly Van Auken on the semi-automatic CC annotations using Textpresso.

Siddhartha and Yogesh worked on the GAF and OBO uploader for Chado to complete the annotation pipeline between Protein2GO, dictyBase, and the GO consortium.

Other dictyBase contributions to GO:

Both dictyBase curators work to improve the GO with GO editors and other curators in the field, and contribute to discussions on the GO email list, in the bi-weekly annotation calls, and on Source Forge. A couple of new GO terms were requested in 2013 to curate Dictyostelium papers.

dictyBase Is currently working on a complete database overhaul. Once that is completed 3 additional genomes (D. purpureum, D. fasciculatum and P. pallidum) will be annotated semi-automatically by transferring experimental GO annotations from D. discoideum ISS to 1:1 orthologs.


Annotation Progress

The total and non-IEA annotation numbers include annotations that are not from dictyBase, but from the PAINT project and, very few, from SwissProt. At this point we cannot easily query them by source. However, the experimental annotations are nearly all from dictyBase and the indicated annotation extensions (column 16) have been exclusively annotated by dictyBase curators (Table 1).


Table 1: Number of Annotations

2012
2013
% Change
Total number of annotations 56475 57925 + 2.6%
Function 22166 21968 - 0.9%
Process 18600 19581 + 5.3%
Component 15709 16376 + 4.2%


Table 2: Number of non-IEA Annotations

2012
2013
% Change
Total number of annotations 18604 20367 + 9.5%
Function 4566 4961 + 8.7%
Process 8471 9248 + 9.2%
Component 5567 6158 + 10.6%


Table 3: Additional Numbers

2012
2013
% Change
Total EXP annotations
4756
5598
+ 18%
Annotation extensions
66
187
+ 183%


Methods and strategies for annotation

(please note % effort on literature curation vs. computational annotation methods)

Literature and other manual curation represent nearly 100% of the curation activities at dictyBase.


Literature curation

In addition to gene product, strain and phenotype annotation, dictyBase curators extract GO annotations from Dictyostelium publications. GO annotations are added using the Protein2GO tool provided by the EBI. Annotations are currently imported into dictyBase and supplied to the GO consortium monthly.

Use of Textpresso to annotate cellular components is imminent. The updated tool that transfers Textpresso annotations directly to Protein2GO has been made available recently. Once implemented, this should increase efficiency of GO curation by reducing time curators spend on literature.


Automated methods

IEAs are being imported from GOA and assigned to the respective gene products on a monthly schedule.


Quality control measures

dictyBase curators work closely to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://wiki.dictybase.org/dictywiki/index.php/Standard_Operating_Procedures) to which curators adhere. Curators discuss consistency issues as they arise and decisions are recorded in the Standard Operating Procedures.