DictyBase, March 2010

From GO Wiki
Revision as of 12:53, 3 November 2011 by Pfey03 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Staff

PI: Rex Chisholm Annotators: Petra Fey, Pascale Gaudet, Robert Dodson Developers: Siddhartha Basu, Yulia Bushmanova, Eric Just, (consultant)

All dictyBase staff contributes to GO activities. This is a total of 5 FTE positions. Of these we receive sufficient funding from the GO grant to support about 1.05 FTEs.

Annotation. Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, phenotypes, and sequences. All curators work to annotate gene products of the Dictyostelium genome. We import GOA annotations into dictyBase and incorporate them into our monthly gene association file.

Reference Genome Project. Pascale is manager for the Reference Genome project. All dictyBase curators annotate reference genome genes and are up-to-date with the selected orthologs. A paper describing the project has been published earlier this year: Mi et al 2010. PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. NAR Database issue [1]

Other dictyBase contributions to GO:

All dictyBase curators work to improve the GO with GO editors and other curators in the field, and contribute to discussion on the GO email list and Source Forge. Pascale is a member of the Reference Genome, AmiGO/web presence, GO Evidence Code and Ontology development working groups. In the course of the Reference Genome Project, Pascale works with Suzi Lewis, Kara Dolinski and Paul Thomas (Panther) to develop a tree-based orthology inference system for the Reference Genome project. Currently there is a lot of effort in the development of the software tool, PAINT that will be used to make the tree-based annotations.

Petra is a member of the Newsletter group, the OBO-Edit working group, and the Reference Genome annotation group.

Siddhartha is part of the Software group.

dictyBase is moving to the use of the Chado schema to store GO annotations. As part of this move we will be redesigning a new GO annotation tool. Sequencing of the genomes of several species related to Dictyostelium discoideum, including D. purpureum and D. citrinum is underway. dictyBase will facilitate the annotation of these genomes as their sequences become publicly available.


Annotation Progress

Table 1: Number of Annotations

09/2009 03/2010 % Change
Total number of annotations 30762 31109 +1 %
Function 13199 13240 + 0.3%
Process 9848 9980 +1%
Component 7715 7889 +2%


Table 2: Number of non-IEA Annotations

09/2009 03/2010 % Change
Total number of annotations 19998 20410 +2%
Function 6813 6878 + 1%
Process 7141 7304 +2%
Component 6044 6228 +3%

Methods and strategies for annotation

(please note % effort on literature curation vs. computational annotation methods)

Literature and other manual curation represent nearly 100% of the curation activities at dictyBase.

Literature curation

In addition to gene product, strain and phenotype annotation, dictyBase curators extract GO annotations from Dictyostelium publications.

We began collaborating with WormBase in using Textpresso to suggest GO terms for annotation to cellular component terms (Van Auken et al., 2009, BMC Bioinformatics 2009, 10:228). This tool has been trained to evaluate the semantic context of GO terms to enrich for terms in sections of the papers describing methods or results, providing a high recall rate for experimentally supported GO annotations. We are establishing a pipeline that will run full text search on GO cellular component terms and provide curators suggestions for annotation. The curation interface will show the gene name, the suggested GO term, and the relevant sentence from the paper leading to the annotation, allowing the curator to accept or reject the annotation. Extension of the tool to capture GO molecular functions is currently under development by WormBase. This will increase efficiency of GO curation by reducing time curators spend on literature mining.

Curation of previously unidentified genes and gene products

In addition to genes that have been characterized in the literature, dictyBase curators are annotating gene products that have EST coverage and/or contain conserved functional domains. Gene products of this type are annotated with the ISS and ND evidence codes as there is no published data available.


Automated methods

IEAs imported from GOA: include InterPro2GO and SPKW2GO and assigned to the respective gene products.

Quality control measures

dictyBase curators work closely to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://wiki.dictybase.org/dictywiki/index.php/Standard_Operating_Procedures) to which curators adhere. The three curators discuss consistency issues as they arise and decisions are recorded in the Standard Operating Procedures.