Saccharomyces Genome Database Summary, Dec 2013 - March 2014

Staff

Rama Balakrishnan, Gail Binkley, J. Michael Cherry, Maria Costanzo, Selina Dwight, Stacia Engel, Ben Hitz, Stuart Miyasato, Matt Simison, Rob Nash, Marek Skrzypek, Shuai Weng, Edith Wong, Paul Lloyd, Janos Demeter, Diane Inglis, Kelley Paskov

[please include FTEs working on GOC tasks designating as well how many FTEs funding by GOC NIHGRI grant]

Annotation Progress

GO Aspect	Number of Annotations Added	Number of genes updated	Number of publications used
Biological Process	385	254	186
Molecular Function	149	121	76
Cellular Component	244	199	85

Note that these numbers count manually curated and high-throughput annotations only for ORFs that are Verified or Uncharacterized (Dubious ORFs are excluded), for RNA genes (ncRNA, rRNA, snRNA, snoRNA, or tRNA) and for genes encoded within transposable elements. It should also be noted these annotations may include both new annotations and updated annotations which replaced older ones.

State of GO annotations Genome wide

Type	Counts as of March 2014
GP with Any Annotation	6381
GPs with Manual annotation	6381
GP with Experimental and Computational Evidence	5449
GP with Computational Evidence	5449
GP with Curator Evidence (TAS, NAS, IC)	1243
GP with No Data (ND) in MF	1945
GP with No Data (ND) in BP	1142
GP with No Data (ND) in CC	745
All annotations	92625
Total Annotations with Manual curation	47752
Total Annotations with Computational Evidence	44873
Annotations with Curator Evidence (TAS, NAS, IC)	2954
Annotations with No Data (ND)	3832

Methods and strategies for annotation (please note % effort on literature curation vs. computational annotation methods)

a. Literature curation: 100% of SGD’s effort is dedicated to manual curation based on the published literature for budding yeast gene and their products.

b. Computational annotation strategies: SGD does not employ automated methods to assign annotations, rather we absorb the computationally predicted annotations made by the UniProtKB GOA project for S. cerevisiae. The IEA annotations are loaded into the SGD database from the GOA gene association file after each release. All these annotations are included in the gene_association.sgd file, which represents a significant expansion of the types of evidence codes and data sources that are provided by SGD.

c. Priorities for annotation: The highest priority is to capture annotations where new information is available for an Uncharacterized gene product. These papers are identified during the literature triage process. In addition, we update older annotations. SGD captures the date when the annotations for a gene were reviewed. Using this date reviewed, older annotations are checked for consistency with the current literature.

d. SGD has incorporated Phylogeny based annotations made by PAINT. These annotations are now part of SGD's gene_association.sgd file

f. SGD curators are routinely creating terms via the new TermGenie interface to speed up the process of annotation.

Presentations and Publications

Papers with substantial GO content

none

Posters

none

Other Highlights

A. Annotation Outreach

SGD curators participate in Annotation conference calls and curation Jamboree.
Janos Demeter, Paul Llyod and Diane Inglis are involved in testing AmiGO releases
R. Balakrishnan is a manager for the Annotation Advocacy working group
R. Balakrishnan is part of the rotation that answers user email from gohelp.
R. Balakrishnan is part of a working group involved in the redesign of the GOC website.
R. Balakrishnan is part of the PAINT curation team

E. Software

Stanford continues to serve as the Production server for the GO Database and for the AmiGO web application.
Migration and release of AmiGO2 from Stanford servers was completed in March 2014
- The AmiGO 2 service consists of two web servers running on KVM/QEMU virtual machines (VMs) behind a hardware load balancer. Two more VMs provide a “reverse proxy” service to prevent the public from accessing “private” portions of the server. The VMs are running on separate physical servers in order to maximize service uptime in the event of hardware failure. There is also a physical back end “loading” machine that downloads the source data files and creates search indexes. These servers run on Dell hardware. The VMs are running the CentOS 6.4 operating system, while the loading server and the physical machines hosting the VMs are running on the Red Hat Enterprise Linux 6.4 operating system. A NetApp appliance provides the VMs with disk storage via NFS, which allows for live migration (no downtime) of VMs between physical hosts if needed. The loading server downloads and processes the raw data and creates indexes that allow for very fast data queries. This data is generated daily and gets copied to the web servers. This allows subsequent loads to be performed without impacting web server availability. This process is fully automated and requires no manual intervention.

B. Hitz, S. Miyasato, G. Binkley, Kalpana Karra are on the go-software mailing list

SGD Dec 2013-March 2014

Contents