DictyBase Progress Report December 2009
- 1 Staff
- 2 Annotation Progress
- 3 Methods and strategies for annotation
- 4 Presentations and Publications
- 5 Other Highlights
PI: Rex Chisholm Annotators: Petra Fey, Pascale Gaudet, Robert Dodson (started September 2009) Developers: Siddhartha Basu, Yulia Bushmanova, Eric Just, (consultant)
All dictyBase staff contributes to GO activities. This is a total of 5 FTE positions. Of these we receive sufficient funding from the GO grant to support about 1.05 FTEs.
Annotation. Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, phenotypes, and sequences. All curators work to annotate gene products of the Dictyostelium genome. We import GOA annotations into dictyBase and incorporate them into our monthly gene association file.
Reference Genome Project. Pascale is manager for the Reference Genome project. All dictyBase curators annotate reference genome genes and are up-to-date with the selected orthologs. A paper describing the project has been published earlier this year: The Reference Genome Group of the Gene Ontology Consortium (Gaudet, P, corresponding author) The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species (2009) PLoS Computational Biology.
Other dictyBase contributions to GO:
All dictyBase curators work to improve the GO with GO editors and other curators in the field, for example Petra worked with a group to improve the cell death terms. Pascale is a member of the Reference Genome, AmiGO/web presence, GO Evidence Code and Ontology development working groups. In the course of the Reference Genome Project, Pascale works with Suzi Lewis, Kara Dolinski and Paul Thomas (Panther) to develop a tree-based orthology inference system for the Reference Genome project. Currently there is a lot of effort in the development of the software tool, PAINT that will be used to make the tree-based annotations.
Petra is a member of the Newsletter group, the OBO-Edit working group, and the Reference Genome annotation group.
Siddhartha is part of the Software group.
dictyBase is moving to the use of the Chado schema to store GO annotations. As part of this move we will be redesigning a new GO annotation tool. Sequencing of the genomes of several species related to Dictyostelium discoideum, including D. purpureum and D. citrinum is underway. dictyBase will facilitate the annotation of these genomes as their sequences become publicly available.
Table 1: Number of Annotations
|Total number of annotations||29787||30830||3.5%|
Table 2: Number of non-IEA Annotations
|Total number of annotations||18860||20075||6%|
Table 3: Number of annotations per evidence code
Collaboration with UniProt/Swiss-Prot to produce a completely annotated proteome
In February 2008, Petra and Pascale spent two weeks in Geneva working with Swiss-Prot to make Dictyostelium one of the Swiss-Prot 'complete proteomes' by ensuring that each protein is represented by a single record, which involved merging several records corresponding to duplicated genes or partial gene sequences. We are continuing this collaboration with approximately 3 FTEs at UniProtKB-Swiss-Prot working on Dictyostelium entries. We hold monthly phone conferences to discuss annotation priorities and issues and try to align nomenclature guidelines between the two resources. Dictyostelium now ranks at the 10th position in terms of number of manual entries per species in UniProt-KB/Swiss-Prot with 4200 annotated entries (release 57.11 of 24-Nov-09).
Methods and strategies for annotation
(please note % effort on literature curation vs. computational annotation methods)
Literature and other manual curation represent nearly 100% of the curation activities at dictyBase.
In addition to gene product, strain and phenotype annotation, dictyBase curators extract GO annotations from Dictyostelium publications.
We began collaborating with WormBase in using Textpresso to suggest GO terms for annotation to cellular component terms (Van Auken et al., 2009, BMC Bioinformatics 2009, 10:228). This tool has been trained to evaluate the semantic context of GO terms to enrich for terms in sections of the papers describing methods or results, providing a high recall rate for experimentally supported GO annotations. We are establishing a pipeline that will run full text search on GO cellular component terms and provide curators suggestions for annotation. The curation interface will show the gene name, the suggested GO term, and the relevant sentence from the paper leading to the annotation, allowing the curator to accept or reject the annotation. Extension of the tool to capture GO molecular functions is currently under development by WormBase. This will increase efficiency of GO curation by reducing time curators spend on literature mining.
Curation of previously unidentified genes and gene products
In addition to genes that have been characterized in the literature, dictyBase curators are annotating gene products that have EST coverage and/or contain conserved functional domains. Gene products of this type are annotated with the ISS and ND evidence codes as there is no published data available.
IEAs via the BLAST method: All Dictyostelium protein sequences are analyzed by BLAST against GO gene association sequence files (http://www.geneontology.org/index.shtml#downloads), identifying proteins from the GO database that align with Dictyostelium proteins with an E value ≤ e-50. GO annotations that have been manually assigned to these proteins from other species are attached to the corresponding gene product in dictyBase. The proteins from which the annotations are derived are displayed in the 'Evidence' column on the Gene Ontology evidence and references page.
IEAs imported from GOA: include InterPro2GO and SPKW2GO and assigned to the respective gene products.
Quality control measures
dictyBase curators work closely to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://wiki.dictybase.org/dictywiki/index.php/Standard_Operating_Procedures) to which curators adhere. The three curators discuss consistency issues as they arise and decisions are recorded in the Standard Operating Procedures.
Presentations and Publications
Papers with substantial GO content
• The Reference Genome Group of the Gene Ontology Consortium (Gaudet, P, corresponding author) The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species (2009) PLoS Computational Biology
Presentations including Talks and Tutorials and Teaching
• The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species, The Quest for Ortholog meeting, Hinxton, UK, July 2009 (Pascale Gaudet)
• Annotation in Model Organism Databases, Concordia University, Montreal, June 2009 (Pascale Gaudet)
• The reference Genome Project. International Biocurator Conference, Berlin, April 2009 (Pascale Gaudet)
• The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species, International Dicty Meeting, Estes Park, CO, September 2009 (Pascale Gaudet)
Ontology Development Contributions
Annotators have requested several additions and changes to the ontologies necessary to annotate Dictyostelium development. These requests focus on, but are not limited to, process terms to describe developmental events such as cell type differentiation and formation of developmental structures. Curators continue to develop a pre-composed phenotype ontology using qualities from the PATO ontology. Every newly added term comes from a GO process, or Dicty anatomy term plus a PATO term from the quality.obo.
New dictyBase Gene Page
In response to the ever increasing amount of data we represent, and to keep it up in the future, we have done a major update on our Gene Pages. Changes include: • Grouping all gene related information, such as protein annotations, gene ontology, references and phenotypes into tabs on top of the gene page.
• Individual sections on the page can be collapsed. Section settings are 'sticky', meaning the browser 'remembers' which sections were open on the last visit.
• Sections are loaded independently, making the overall page loading faster.
• Each tab and section contains a question mark icon that displays help hints upon click.
• A new FASTA button allows fast access to sequences.
• Different splice variants are accessible via sub-tabs.
The Dictyostelium genomic sequence was updated in GenBank in August 2009 with several improved gene models and the insertion of a previously unassigned contig into the genome.