Saccharomyces Genome Database Summary, 2011

Staff

Rama Balakrishnan, Gail Binkley, J. Michael Cherry, Karen Christie, Maria Costanzo, Selina Dwight, Stacia Engel, Dianna Fisk, Jodi Hirschman, Ben Hitz, Eurie Hong, Cindy Krieger, Stuart Miyasato, Rob Nash, Julie Park, Marek Skrzypek, Shuai Weng, Edith Wong (Stanford); Kara Dolinski, Michael Livstone, Rose Oughtred (Princeton)

[please include FTEs working on GOC tasks designating as well how many FTEs funding by GOC NIHGRI grant]

Annotation Progress

GO Aspect	Number of Annotations Added	Number of genes updated	Number of publications used
Biological Process	1547	699	791
Molecular Function	989	582	516
Cellular Component	1323	1117	305

Note that these numbers count manually curated and high-throughput annotations only for ORFs that are Verified or Uncharacterized (Dubious ORFs are excluded), for RNA genes (ncRNA, rRNA, snRNA, snoRNA, or tRNA) and for genes encoded within transposable elements. It should also be noted these annotations may include both new annotations and updated annotations which replaced older ones.

State of GO annotations Genome wide

Type	Counts as of November 2, 2011
GP with Any Annotation	6357
GPs with Manual annotation	6357
GP with Experimental and Computational Evidence	5331
GP with Computational Evidence	5331
GP with Curator Evidence (TAS, NAS, IC)	1533
GP with No Data (ND) in MF	1988
GP with No Data (ND) in BP	1199
GP with No Data (ND) in CC	714
All annotations	81074
Total Annotations with Manual curation	42386
Total Annotations with Computational Evidence	38688
Annotations with Curator Evidence (TAS, NAS, IC)	3325
Annotations with No Data (ND)	3954

Methods and strategies for annotation (please note % effort on literature curation vs. computational annotation methods)

a. Literature curation: 100% of SGD’s effort is dedicated to manual curation based on the published literature for budding yeast gene and their products.

b. Computational annotation strategies: SGD does not employ automated methods to assign annotations, rather we absorb the computationally predicted annotations made by the UniProtKB GOA project for S. cerevisiae. The IEA annotations are loaded into the SGD database from the GOA gene association file after each release. All these annotations are included in the gene_association.sgd file, which represents a significant expansion of the types of evidence codes and data sources that are provided by SGD. Note: The computationally predicted annotations generated by the integrated bioinformatic analysis of high-throughput data from the Roth and Troyanskaya labs (Tian et. al., 2008, Huttenhower and Troyanskaya, 2008) were removed from SGD because those predictions (annotations) were not refreshed/updated.

c. Priorities for annotation: The highest priority is to capture annotations where new information is available for an Uncharacterized gene product. These papers are identified during the literature triage process. In addition, we update older annotations. SGD captures the date when the annotations for a gene were reviewed. Using this date reviewed, older annotations are checked for consistency with the current literature.

d. SGD has incorporated Phylogeny based annotations made by PAINT. These annotations are now part of SGD's gene_association.sgd file

e. SGD curators recently concluded a large comparison study of manual vs computationally predicted annotations and are using the results from this to study to prioritize and come up with curation strategies.

f. SGD curators are routinely creating terms via the new TermGenie interface to speed up the process of annotation.

Presentations and Publications

* Papers with substantial GO content
** PMID 21411447 Using computational predictions to improve literature-based Gene Ontology annotations: a feasibility study. Costanzo MC, Park J, Balakrishnan R, Cherry JM, Hong EL., Database (Oxford). 2011 Mar 15;2011:bar004. Print 2011.

* Presentations including Talks and Tutorials and Teaching ** Eurie L. Hong, Using Computational predictions to improve literature-based GO annotations, CSHL Genome Informatics Meeting, November 2011

* Poster presentations
** Using Computational predictions to improve Literature based Gene Ontology (GO) annotations. Park J, Costanzo MC, Balakrishnan R, Cherry JM, Hong EL. Presented at the CSHL Yeast Cell Biology Meeting, 2011

Other Highlights

A. Ontology Development Contributions:

Karen Christie has been working on improving the transcription part of the ontology, she has also put together a curation manual for the reannotating the transcription part of the ontology

Participation in Sourceforge requests since October 2010
- New Term Requests and Ontology Changes submitted by SGD: 41
- New term requests via TermGenie: 24 (probably more)

B. Annotation Outreach and User Advocacy Efforts:

E. Hong is part of the rotation that answers user email from gohelp.
R. Balakrishnan, E. Wong and B. Hitz participate in the WebPresence and AmiGO Hub working groups.
R. Balakrishnan is a manager for the Annotation Advocacy working group
K. Christie participates in OBO-Edit working group.
SGD curators participate in Annotation conference calls and curation Jamboree.

C. GOMine

Kalpana Karra has been working on building GOMine which is an implementation of InterMine. GOMine (http://goad.stanford.edu:8080/gomine/begin.do) should serve as a fast search and retrieval tool for GO data without having to know sql or the database table structure.

D. Software

SGD continues to serve as the Production server for the GO Database and for the AmiGO web application.
Craig Amundsen is working with Chris et al on the GOLD database.

SGD November 2011

Contents