SGD December 2014

From GO Wiki
Jump to navigation Jump to search

Saccharomyces Genome Database Summary, 2014


Rama Balakrishnan, Gail Binkley, J. Michael Cherry, Maria Costanzo, Janos Demeter, Stacia Engel, Ben Hitz, Stuart Miyasato, Rob Nash, Kelley Paskov, Travis Sheppard, Matt Simison, Marek Skrzypek, Shuai Weng, Edith Wong

[please include FTEs working on GOC tasks designating as well how many FTEs funding by GOC NIHGRI grant]

Annotation Progress

GO Aspect Number of Annotations Added Number of genes updated Number of publications used
Biological Process 899 622 466
Molecular Function 449 372 195
Cellular Component 646 527 195

Note that these numbers count manually curated and high-throughput annotations only for ORFs that are Verified or Uncharacterized (Dubious ORFs are excluded), for RNA genes (ncRNA, rRNA, snRNA, snoRNA, or tRNA) and for genes encoded within transposable elements. It should also be noted these annotations may include both new annotations and updated annotations which replaced older ones.

State of GO annotations Genome wide

Type Counts as of December 11, 2014
GP with Any Annotation 6379
GPs with Manual annotation 6379
GP with Experimental and Computational Evidence 5727
GP with Computational Evidence 5727
GP with Experimental Evidence 5651
GP with Curator Evidence (TAS, NAS, IC) 1165
GP with No Data (ND) in MF 1904
GP with No Data (ND) in BP 1111
GP with No Data (ND) in CC 722
All annotations 86405
Total Annotations with Manual curation 45393
Total Annotations with Computational Evidence 41012
Annotations with Curator Evidence (TAS, NAS, IC) 2276
Annotations with No Data (ND) 5991

Methods and strategies for annotation (please note % effort on literature curation vs. computational annotation methods)

a. Literature curation: 100% of SGD’s effort is dedicated to manual curation based on the published literature for budding yeast gene and their products.

b. Computational annotation strategies: SGD does not employ automated methods to assign annotations, rather we absorb the computationally predicted annotations made by the UniProtKB GOA project for S. cerevisiae. The IEA annotations are loaded into the SGD database from the GOA gene association file after each release. All these annotations are included in the gene_association.sgd file, which represents a significant expansion of the types of evidence codes and data sources that are provided by SGD.

c. Priorities for annotation: The highest priority is to capture annotations where new information is available for an Uncharacterized gene product. These papers are identified during the literature triage process. SGD also collects information from Authors about the data in their papers (aka FastTrack). SGD prioritizes papers based on the response from the authors. In addition, we update older annotations. SGD captures the date when the annotations for a gene were reviewed. Using this date reviewed, older annotations are checked for consistency with the current literature.

d. SGD has incorporated Phylogeny based annotations made by PAINT. These annotations are now part of SGD's gene_association.sgd file

e. SGD curators are routinely creating terms via the new TermGenie interface to speed up the process of annotation.

f. SGD curators are capturing more specificity for annotations by adding substrate, targets etc in the Annotation Extension column (aka col-16) using the protein2GO interface.

g. SGD curators have also keeping up with the GAF checks on Jenkins and the protein2GO error checks

Function Summaries: New Feature

SGD curators have recently started summarizing the GO annotations for each gene in few sentences and these summaries are directly entered into Protein2GO. These summaries are associated with each gene in the GeneProductInformation (GPI) file. They will displayed on the Locus summary page and on the GO Annotations page in SGD in the coming months.

Presentations and Publications


  • Marcus C. Chibucos, Christopher J. Mungall, Rama Balakrishnan, Karen R. Christie, Rachael P. Huntley, Owen White1, Judith A. Blake, Suzanna E. Lewis and Michelle Giglio. Standardized description of scientific evidence using the Evidence Ontology (ECO). Database (Oxford) 2014 doi: 10.1093/database/bau075


  • Balakrishnan R, Chris Mungall, Heiko Doetze, Jame Lomax, Tony Sawford, 2014. GPAD/GPI: Next Generation file format for GO annotations, Biocuration Meeting, Toronto, Canada

Other Highlights

Annotation Outreach

  • SGD curators participate in Annotation conference calls
  • R. Balakrishnan is a manager for the Annotation Advocacy working group
  • R. Balakrishnan manages the go-helpdesk rotation and is part of the rotation that answers user emails from gohelp.
  • R. Balakrishnan is part of the PAINT curation team
  • R. Balakrishnan is working with PseudoCAP to assist them in providing an updated annotation file.
  • R. Balakrishnan is working with Z.mays group to get their GO annotations


  • Stanford continues to serve as the Production server for the GO Database and for the AmiGO web application.
  • The new website for the GOC was released (hosted using Amazon cloud)
  • We moved the SVN repository and GO wiki to new VM machines this year
  • Migration and release of AmiGO2 from Stanford servers was completed in March 2014
    • We load the GOlr indices for AmiGO2 and address any issues related to GOlr loading
    • there have been several feature updates and releases since the release
  • Keep track of GO MySQL cron jobs running successfully. Since the run time is some what unpredictable and the files would get copied over before the cron job finished, for go-lite and go-full, the copy-to-ftp job is run manually at Stanford after the loading finishes.
  • B. Hitz, S. Miyasato, Kalpana Karra, G. Binkley are on the go-software mailing list