DictyBase Progress Report December 2008

PI: Rex Chisholm Annotators: Petra Fey, Pascale Gaudet Developers: Eric Just, Siddhartha Basu, Yulia Bushmanova (started June 2008).

All dictyBase staff contributes to GO activities. This is a total of 4.3 FTE positions. Of these we receive sufficient funding from the GO grant to support about 1.05 FTEs.

Annotation. Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, phenotypes, and sequences. All curators work to annotate gene products of the Dictyostelium genome. We import GOA annotations into dictyBase and incorporate them into our monthly gene association file.

Reference Genome Project. Pascale, together with Rex, is leading the Reference Genome project. All dictyBase curators annotate reference genome genes and are up-to-date with the selected orthologs. Siddhartha is working on the development of a reference genome curation tool, which can be viewed here. http://berkeleybop.org/RefG/RefGenome.html

Other dictyBase contributions to GO. Pascale is a member of the Reference Genome, AmiGO/web presence, GO Evidence Code and Ontology development working groups and works with Kara Dolinski and Paul Thomas (Panther) to establish a tree-based orthology set for the Reference Genome project. Pascale organized the October 2008 GOC and SAB meetings in Sainte-Adele, Canada. Petra is a member of the OBO-Edit working group and the Reference Genome annotation group.

dictyBase is moving to the use of the Chado schema to store GO annotations. As part of this move we will be redesigning a new GO annotation tool. Sequencing of the genomes of several species related to Dictyostelium discoideum, including D. purpureum and D. citrinum is underway. dictyBase will facilitate the annotation of these genomes as their sequences become publicly available. We are also developing databases for Physarum and Acanthamoeba as they are completed and will facilitate GO annotation of these genomes as well.

Annotation Progress

Please note that the decrease in annotations is due to a decision to remove lower confidence IEA annotations. There has been steady and significant progress in manual annotation.

Table 1: Number of Annotations

12/2007 12/2008 % Change
Total number of annotations 31196 29787 -5%
Function 13429 12991 -3%
Process 10150 9487 -7 %
Component 7617 7309 -4 %

Table 2: Number of non-IEA Annotations

12/2007 12/2008 % Change
Total number of annotations 16554 18860 14%
Function 5789 6510 12%
Process 5848 6742 15 %
Component 1917 5608 14 %

Table 3: Number of annotations per evidence code

12/2007 12/2008 % Change
IMP 931 1180 27%
IGI 115 173 50%
IPI 199 214 8%
ISS 8848 8420 6%
IDA 1378 1574 14%
IEP 78 84 8%
TAS 486 488 0%
NAS 16 16 0%
NR 0 0 N/A
IEA 14642 10927 -25%
ND 4290 5471 28%
IC 213 235 10%
RCA 0 0 N/A

Collaboration with UniProt/Swiss-Prot to produce a completely annotated proteome

In February 2008, Petra and Pascale spent two weeks in Geneva working with Swiss-Prot to make Dictyostelium one of the Swiss-Prot 'complete proteomes' by ensuring that each protein is represented by a single record, which involved merging several records corresponding to duplicated genes or partial gene sequences. At this meeting, the Swiss-Prot and dictyBase annotators annotated over 1,000 Dictyostelium entries during this 'annotation marathon' in March. In January 2007, there were only 337 curated entries in SwissProt for Dictyostelium. As of October 2008, there are over 3,100 entries, and Dictyostelium in now one of the top ten species when ranked by the number of curated genes in SwissProt (up from rank 337 in January 2007). SwissProt and dictyBase have reciprocal links to each other. This effort greatly increases the visibility and accessibility of Dictyostelium genomic data to any researcher that accesses SwissProt. The close collaboration between UniProtKB and dictyBase will continue until the completion of Dictyostelium discoideum proteome annotation, planned for 2010. This will have a substancial positive impact on the Dictyostelium data for the reference genome project.

Methods and strategies for annotation

(please note % effort on literature curation vs. computational annotation methods)

Literature and other manual curation represent nearly 100% of the curation activities at dictyBase.

Literature curation.

In addition to gene product, strain and phenotype annotation, dictyBase curators extract GO annotations from Dictyostelium publications. To date, we are current with the literature since January of 2004 and are working our way backwards chronologically while staying up-to-date with new publications.

. Curation of previously unidentified genes and gene products.

In addition to genes that have been characterized in the literature, dictyBase curators are annotating gene products that have EST coverage and/or contain conserved functional domains. Gene products of this type are annotated with the ISS and ND evidence codes as there is no published data available.

Automated methods.

=== IEAs via the BLAST method.=== All Dictyostelium protein sequences are analyzed by BLAST against GO gene association sequence files (http://www.geneontology.org/index.shtml#downloads), identifying proteins from the GO database that align with Dictyostelium proteins with an E value ≤ e-50. GO annotations that have been manually assigned to these proteins from other species are attached to the corresponding gene product in dictyBase. The proteins from which the annotations are derived are displayed in the 'Evidence' column on the Gene Ontology evidence and references page.

=== IEAs imported from GOA===, which include InterPro2GO and SPKW2GO and assigned to the respective gene products.

Quality control measures.

dictyBase curators work closely to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://wiki.dictybase.org/dictywiki/index.php/Standard_Operating_Procedures) to which curators adhere. The two curators discuss consistency issues as they arise and decisions are recorded in the Standard Operating Procedures.

Presentations and Publications

. Papers with substantial GO content

• Gaudet, Williams, Fey & Chisholm (2008) An anatomy ontology to represent biological knowledge in Dictyostelium discoideum. BMC Genomics 9:130

• Howe, Costanzo, Fey, Gojobori, Hannick, Hide, Hill, Kania, Schaeffer, St Pierre, Twigger, White & Rhee (2008) Big data: The future of biocuration. Nature 455: 47

• Gene Ontology Consortium (2008) The Gene Ontology Project in 2008. Nucl. Acids Res. 36: D440-4. PMID: 17984083

• dictyBase - A Dictyostelium bioinformatics resource update, Nucleic Acids Res, Database issue, Jan 2009

• Carbon et al., the AmiGO working group of the Gene Ontology Consortium. AmiGO: comprehensive online access to ontology and annotation data, submitted.

• Gaudet P, The Reference Genome Group of the Gene Ontology Consortium, The GO Reference Genome Annotation Project, in preparation.

Presentations including Talks and Tutorials and Teaching

• Gaudet, Gene Ontology Overview. Swiss Prot, October 2008

• Gaudet, Williams, Fey, Chisholm. Dictyostelium anatomy and phenotypes. International Dictyostelium meeting, Tsukuba, Japan Sept 2008

• Fey, Gaudet, Basu, Just, Bushmanova, Kibbe, Chisholm. dictyBase Update 2008 International Dictyostelium meeting, Tsukuba, Japan Sept 2008

Poster presentations

• Gaudet, Fey, Just, Merchant, Basu, Kibbe, Chisholm. dictyBase Strain and Phenotype Curation, ISMB July 2008

5. Other Highlights:

A. Ontology Development Contributions:

Annotators have requested several additions and changes to the ontologies necessary to annotate Dictyostelium development. These requests focus on, but are not limited to, process terms to describe developmental events such as cell type differentiation and formation of developmental structures. Non-developmental terms have also been requested, including terms related to metabolism, cytoskeleton, DNA modification, and glycosylation. Additionally, we frequently deal with InterPro2GO mappings and request deletions or additions of terms to InterPro records Since January 2008, we have submitted 19 items to Curator requests and 7 items to the Annotation issues at SourceForge.net.

Curators are also developing a phenotype ontology that is increasingly decomposable into the PATO. Every newly added term comes from a GO process, or Dicty anatomy term plus a PATO term from the quality.obo.

Pascale Gaudet with help from Jeff Williams has developed a Dictyostelium anatomy ontology (published in BMC Genomics).

B. Annotation Outreach and User Advocacy Efforts:

Pascale visited the Swiss-Prot group in Geneva to introduce GO and the GO annotation process (SwissProt October 2008). Following a discussion that took place at the UniProt tripartite collaborative meeting in Washington DC, Amos Bairoch announced to the UniProt SAB that Swiss-Prot would start using GO in their annotation process. This was enthusiastically approved by the SAB.

During the Swiss-Prot meeting, the following steps were taken:

-- Pascale presented an overview of the GO to the entire Swiss-Prot group where she described the ontology (term structure, ontology structure), evidence codes, basic relationships (ie the existing ones), manual and electronic annotation process, annotating to the root, gene association file, AmiGO, maintenance of the GO, reference genomes and OBO.

-- dictyBase curators had a number of discussions with the leaders of the annotation groups at Swiss-Prot regarding the actual process of annotation, files, who is responsible for which species, the Protein2GO tool from SwissProt; and GO tools such as OBO edit, Quick GO and AmiGO), how consortium meetings take place, creation of new terms and relationships (in the view that they already have a number of controlled vocabularies), making cross links to their vocabularies versus adding terms and relationships in the GO.

-- dictyBase curators also discussed the requirements for making GO annoations. The Swiss-Prot curators will need additional tutorials and workshops to aid them in using the GO and making annotations (Emily and Rachael have already offered to do that); also, they need to modify their software to capture the required information

-- Finally, dictyBase curators talked about the role of Swiss-Prot in the consortium: when this is all in place, they will provide quite a large number of annotations covering many species that are not covered by the MODs; this will be very useful for the reference genome project. In addition to annotations, as mentioned above, Swiss-Prot controlled vocabularies and the enormous expertise of their annotators can be used to improve the GO itself.

- A time frame for providing annotations to GO will be from 9 to 12 months.

--Petra gave an informal tutorial on GO annotations in dictyBase, the AmiGO browser, and the OBO-edit software to a student of Osaka University in September 2008.