DictyBase Progress Report December 2009

From GO Wiki
Jump to: navigation, search


PI: Rex Chisholm Annotators: Petra Fey, Pascale Gaudet, Robert Dodson (started September 2009) Developers: Siddhartha Basu, Yulia Bushmanova, Eric Just, (consultant)

All dictyBase staff contributes to GO activities. This is a total of 5 FTE positions. Of these we receive sufficient funding from the GO grant to support about 1.05 FTEs.

Annotation. Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, phenotypes, and sequences. All curators work to annotate gene products of the Dictyostelium genome. We import GOA annotations into dictyBase and incorporate them into our monthly gene association file.

Reference Genome Project. Pascale is manager for the Reference Genome project. All dictyBase curators annotate reference genome genes and are up-to-date with the selected orthologs. A paper describing the project has been published earlier this year: The Reference Genome Group of the Gene Ontology Consortium (Gaudet, P, corresponding author) The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species (2009) PLoS Computational Biology.

Other dictyBase contributions to GO:

All dictyBase curators work to improve the GO with GO editors and other curators in the field, for example Petra worked with a group to improve the cell death terms. Pascale is a member of the Reference Genome, AmiGO/web presence, GO Evidence Code and Ontology development working groups. In the course of the Reference Genome Project, Pascale works with Suzi Lewis, Kara Dolinski and Paul Thomas (Panther) to develop a tree-based orthology inference system for the Reference Genome project. Currently there is a lot of effort in the development of the software tool, PAINT that will be used to make the tree-based annotations.

Petra is a member of the Newsletter group, the OBO-Edit working group, and the Reference Genome annotation group.

Siddhartha is part of the Software group.

dictyBase is moving to the use of the Chado schema to store GO annotations. As part of this move we will be redesigning a new GO annotation tool. Sequencing of the genomes of several species related to Dictyostelium discoideum, including D. purpureum and D. citrinum is underway. dictyBase will facilitate the annotation of these genomes as their sequences become publicly available.

Annotation Progress

Table 1: Number of Annotations

12/2008 12/2009 % Change
Total number of annotations 29787 30830 3.5%
Function 12991 13200 1.6%
Process 9487 9885 4.2%
Component 7309 7745 6%

Table 2: Number of non-IEA Annotations

12/2008 12/2009 % Change
Total number of annotations 18860 20075 6%
Function 6510 6817 5%
Process 6742 7182 5 %
Component 5608 6076 8 %

Table 3: Number of annotations per evidence code

12/2008 12/2009 % Change
IMP 1180 1269 7%
IGI 173 195 11%
IPI 214 221 3%
ISS 9420 9491 1%
IDA 1574 1807 15%
IEP 84 132 36%
TAS 488 480 -1.5%
NAS 16 16 0%
NR 0 0 N/A
IEA 10927 10755 -1.5%
ND 5471 6212 14%
IC 235 252 7%
RCA 0 0 N/A

Collaboration with UniProt/Swiss-Prot to produce a completely annotated proteome

In February 2008, Petra and Pascale spent two weeks in Geneva working with Swiss-Prot to make Dictyostelium one of the Swiss-Prot 'complete proteomes' by ensuring that each protein is represented by a single record, which involved merging several records corresponding to duplicated genes or partial gene sequences. We are continuing this collaboration with approximately 3 FTEs at UniProtKB-Swiss-Prot working on Dictyostelium entries. We hold monthly phone conferences to discuss annotation priorities and issues and try to align nomenclature guidelines between the two resources. Dictyostelium now ranks at the 10th position in terms of number of manual entries per species in UniProt-KB/Swiss-Prot with 4200 annotated entries (release 57.11 of 24-Nov-09).

Methods and strategies for annotation

(please note % effort on literature curation vs. computational annotation methods)

Literature and other manual curation represent nearly 100% of the curation activities at dictyBase.

Literature curation

In addition to gene product, strain and phenotype annotation, dictyBase curators extract GO annotations from Dictyostelium publications.

We began collaborating with WormBase in using Textpresso to suggest GO terms for annotation to cellular component terms (Van Auken et al., 2009, BMC Bioinformatics 2009, 10:228). This tool has been trained to evaluate the semantic context of GO terms to enrich for terms in sections of the papers describing methods or results, providing a high recall rate for experimentally supported GO annotations. We are establishing a pipeline that will run full text search on GO cellular component terms and provide curators suggestions for annotation. The curation interface will show the gene name, the suggested GO term, and the relevant sentence from the paper leading to the annotation, allowing the curator to accept or reject the annotation. Extension of the tool to capture GO molecular functions is currently under development by WormBase. This will increase efficiency of GO curation by reducing time curators spend on literature mining.

Curation of previously unidentified genes and gene products

In addition to genes that have been characterized in the literature, dictyBase curators are annotating gene products that have EST coverage and/or contain conserved functional domains. Gene products of this type are annotated with the ISS and ND evidence codes as there is no published data available.

Automated methods

IEAs via the BLAST method: All Dictyostelium protein sequences are analyzed by BLAST against GO gene association sequence files (http://www.geneontology.org/index.shtml#downloads), identifying proteins from the GO database that align with Dictyostelium proteins with an E value ≤ e-50. GO annotations that have been manually assigned to these proteins from other species are attached to the corresponding gene product in dictyBase. The proteins from which the annotations are derived are displayed in the 'Evidence' column on the Gene Ontology evidence and references page.

IEAs imported from GOA: include InterPro2GO and SPKW2GO and assigned to the respective gene products.

Quality control measures

dictyBase curators work closely to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://wiki.dictybase.org/dictywiki/index.php/Standard_Operating_Procedures) to which curators adhere. The three curators discuss consistency issues as they arise and decisions are recorded in the Standard Operating Procedures.

Presentations and Publications

GO 2009 Publications, Tutorials & Workshops, Presentations, Posters, and Resources

Other Highlights

Ontology Development Contributions

Annotators have requested several additions and changes to the ontologies necessary to annotate Dictyostelium development. These requests focus on, but are not limited to, process terms to describe developmental events such as cell type differentiation and formation of developmental structures. Curators continue to develop a pre-composed phenotype ontology using qualities from the PATO ontology. Every newly added term comes from a GO process, or Dicty anatomy term plus a PATO term from the quality.obo.

New dictyBase Gene Page

In response to the ever increasing amount of data we represent, and to keep it up in the future, we have done a major update on our Gene Pages. Changes include: • Grouping all gene related information, such as protein annotations, gene ontology, references and phenotypes into tabs on top of the gene page.

• Individual sections on the page can be collapsed. Section settings are 'sticky', meaning the browser 'remembers' which sections were open on the last visit.

• Sections are loaded independently, making the overall page loading faster.

• Each tab and section contains a question mark icon that displays help hints upon click.

• A new FASTA button allows fast access to sequences.

• Different splice variants are accessible via sub-tabs.

GenBank update

The Dictyostelium genomic sequence was updated in GenBank in August 2009 with several improved gene models and the insertion of a previously unassigned contig into the genome.