DictyBase, September 2009

From GO Wiki
Jump to navigation Jump to search


PI: Rex Chisholm Annotators: Petra Fey, Pascale Gaudet, Robert Dodson (started September 2009) Developers: Siddhartha Basu, Yulia Bushmanova, Eric Just, (consultant)

All dictyBase staff contributes to GO activities. This is a total of 5 FTE positions. Of these we receive sufficient funding from the GO grant to support about 1.05 FTEs.

Annotation. Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, phenotypes, and sequences. All curators work to annotate gene products of the Dictyostelium genome. We import GOA annotations into dictyBase and incorporate them into our monthly gene association file.

Reference Genome Project. Pascale is manager for the Reference Genome project. All dictyBase curators annotate reference genome genes and are up-to-date with the selected orthologs. A paper describing the project has been published earlier this year: The Reference Genome Group of the Gene Ontology Consortium (Gaudet, P, corresponding author) The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species (2009) PLoS Computational Biology.

Other dictyBase contributions to GO. All dictyBase curators work to improve the GO with GO editors and other curators in the field, for example Petra worked with a group to improve the cell death terms. Pascale is a member of the Reference Genome, AmiGO/web presence, GO Evidence Code and Ontology development working groups. In the course of the Reference Genome Project, Pascale works with Suzi Lewis, Kara Dolinski and Paul Thomas (Panther) to develop a tree-based orthology inference system for the Reference Genome project. Currently there is a lot of effort in the development of the software tool, PAINT, that will be used to make the tree-based annotations.

Petra is a member of the OBO-Edit working group, the Newsletter group and the Reference Genome annotation group.

Siddhartha is part of the Software group.

dictyBase is moving to the use of the Chado schema to store GO annotations. As part of this move we will be redesigning a new GO annotation tool. Sequencing of the genomes of several species related to Dictyostelium discoideum, including D. purpureum and D. citrinum is underway. dictyBase will facilitate the annotation of these genomes as their sequences become publicly available.

Annotation Progress

Table 1: Number of Annotations

12/2008 12/2009 % Change
Total number of annotations 29787 30762 3 %
Function 12991 13199 1.5%
Process 9487 9848 4%
Component 7309 7715 5%

Table 2: Number of non-IEA Annotations

12/2008 12/2009 % Change
Total number of annotations 18860 19998 6%
Function 6510 6813 4%
Process 6742 7141 5 %
Component 5608 6044 7 %

Table 3: Number of annotations per evidence code

12/2008 12/2009 % Change
IMP 1180 1269 7%
IGI 173 195 11%
IPI 214 221 3%
ISS 9420 9502 1%
IDA 1574 1805 15%
IEP 84 132 36%
TAS 488 480 -1.5%
NAS 16 16 0%
NR 0 0 N/A
IEA 10927 10764 -1.5%
ND 5471 6126 14%
IC 235 252 7%
RCA 0 0 N/A

Collaboration with UniProt/Swiss-Prot to produce a completely annotated proteome

In February 2008, Petra and Pascale spent two weeks in Geneva working with Swiss-Prot to make Dictyostelium one of the Swiss-Prot 'complete proteomes' by ensuring that each protein is represented by a single record, which involved merging several records corresponding to duplicated genes or partial gene sequences. We are continuing this collaboration with approximately 3 FTEs at UniProtKB-Swiss-Prot working on Dictyostelium entries. We hold monthly phone conferences to discuss annotation priorities and issues and try to align nomenclature guidelines between the two resources. Dictyostelium now ranks at the 10th position in terms of number of manual entries per species in UniProt-KB/Swiss-Prot with 3829 annotated entries (release 57.7 of 01-Sep-09).

Methods and strategies for annotation

(please note % effort on literature curation vs. computational annotation methods)

Literature and other manual curation represent nearly 100% of the curation activities at dictyBase.

a. Literature curation.

In addition to gene product, strain and phenotype annotation, dictyBase curators extract GO annotations from Dictyostelium publications. To date, we are current with the literature since January of 2004 and are working our way backwards chronologically while staying up-to-date with new publications.

b. Curation of previously unidentified genes and gene products.

In addition to genes that have been characterized in the literature, dictyBase curators are annotating gene products that have EST coverage and/or contain conserved functional domains. Gene products of this type are annotated with the ISS and ND evidence codes as there is no published data available.

c. Automated methods.

i. IEAs via the BLAST method. All Dictyostelium protein sequences are analyzed by BLAST against GO gene association sequence files (http://www.geneontology.org/index.shtml#downloads), identifying proteins from the GO database that align with Dictyostelium proteins with an E value ≤ e-50. GO annotations that have been manually assigned to these proteins from other species are attached to the corresponding gene product in dictyBase. The proteins from which the annotations are derived are displayed in the 'Evidence' column on the Gene Ontology evidence and references page.

ii. IEAs imported from GOA, which include InterPro2GO and SPKW2GO and assigned to the respective gene products.

d. Quality control measures.

dictyBase curators work closely to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://wiki.dictybase.org/dictywiki/index.php/Standard_Operating_Procedures) to which curators adhere. The three curators discuss consistency issues as they arise and decisions are recorded in the Standard Operating Procedures.

Presentations and Publications

a. Papers with substantial GO content - The Reference Genome Group of the Gene Ontology Consortium (Gaudet, P, corresponding author) The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species (2009) PLoS Computational Biology

b. Presentations including Talks and Tutorials and Teaching - The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species, The Quest for Ortholog meeting, Hinxton, UK, July 2009 (Pascale Gaudet)

- Annotation in Model Organism Databases, Concordia University, Montreal, June 2009 (Pascale Gaudet)

c. Poster presentations - The reference Genome Project. International Biocurator Conference, Berlin, April 2009 (Pascale Gaudet)

- The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species, International Dicty Meeting, Estes Park, CO, September 2009 (Pascale Gaudet)

Other Highlights

A. Ontology Development Contributions:

Annotators have requested several additions and changes to the ontologies necessary to annotate Dictyostelium development. These requests focus on, but are not limited to, process terms to describe developmental events such as cell type differentiation and formation of developmental structures. Curators continue to develop a pre-composed phenotype ontology using qualities from the PATO ontology. Every newly added term comes from a GO process, or Dicty anatomy term plus a PATO term from the quality.obo.

B. New dictyBase Gene Page

In response to the ever increasing amount of data we represent, and to keep it up in the future, we have done a major update on our Gene Pages. Changes include:

• Grouping all gene related information, such as protein annotations, gene ontology, references and phenotypes into tabs on top of the gene page. • Individual sections on the page can be collapsed. Section settings are 'sticky', meaning the browser 'remembers' which sections were open on the last visit. • Sections are loaded independently, making the overall page loading faster. • Each tab and section contains a question mark icon that displays help hints upon click. • A new FASTA button allows fast access to sequences. • Different splice variants are accessible via sub-tabs. • On the new phenotype tab, strains can be ordered directly by availability by clicking on the shopping cart icon. • The BLAST interface has been updated; now sequences are being prefilled when accessing from the gene page or by searching for DDB or DDB_G IDs. • Graphics have been added to the BLAST results. • We incorporated protein annotations from UniProt on the new protein tab.