SGD December 2013

From GO Wiki
Jump to navigation Jump to search

Saccharomyces Genome Database Summary, 2013

Staff

Rama Balakrishnan, Gail Binkley, J. Michael Cherry, Maria Costanzo, Selina Dwight, Stacia Engel, Dianna Fisk, Ben Hitz, Stuart Miyasato, Matt Simison, Rob Nash, Marek Skrzypek, Shuai Weng, Edith Wong, Paul Lloyd, Janos Demeter, Diane Inglis, Kelley Paskov

[please include FTEs working on GOC tasks designating as well how many FTEs funding by GOC NIHGRI grant]

Annotation Progress

GO Aspect Number of Annotations Added Number of genes updated Number of publications used
Biological Process 1586 615 797
Molecular Function 935 510 431
Cellular Component 619 472 310

Note that these numbers count manually curated and high-throughput annotations only for ORFs that are Verified or Uncharacterized (Dubious ORFs are excluded), for RNA genes (ncRNA, rRNA, snRNA, snoRNA, or tRNA) and for genes encoded within transposable elements. It should also be noted these annotations may include both new annotations and updated annotations which replaced older ones.

State of GO annotations Genome wide

Type Counts as of December 18, 2012
GP with Any Annotation 6382
GPs with Manual annotation 6382
GP with Experimental and Computational Evidence 5701
GP with Computational Evidence 5701
GP with Curator Evidence (TAS, NAS, IC) 1416
GP with No Data (ND) in MF 1971
GP with No Data (ND) in BP 1188
GP with No Data (ND) in CC 731
All annotations 83284
Total Annotations with Manual curation 43223
Total Annotations with Computational Evidence 40061
Annotations with Curator Evidence (TAS, NAS, IC) 2936
Annotations with No Data (ND) 3890

Methods and strategies for annotation (please note % effort on literature curation vs. computational annotation methods)

a. Literature curation: 100% of SGD’s effort is dedicated to manual curation based on the published literature for budding yeast gene and their products.

b. Computational annotation strategies: SGD does not employ automated methods to assign annotations, rather we absorb the computationally predicted annotations made by the UniProtKB GOA project for S. cerevisiae. The IEA annotations are loaded into the SGD database from the GOA gene association file after each release. All these annotations are included in the gene_association.sgd file, which represents a significant expansion of the types of evidence codes and data sources that are provided by SGD. Note: The computationally predicted annotations generated by the integrated bioinformatic analysis of high-throughput data from the Roth and Troyanskaya labs (Tian et. al., 2008, Huttenhower and Troyanskaya, 2008) were removed from SGD because those predictions (annotations) were not refreshed/updated.

c. Priorities for annotation: The highest priority is to capture annotations where new information is available for an Uncharacterized gene product. These papers are identified during the literature triage process. In addition, we update older annotations. SGD captures the date when the annotations for a gene were reviewed. Using this date reviewed, older annotations are checked for consistency with the current literature.

d. SGD has incorporated Phylogeny based annotations made by PAINT. These annotations are now part of SGD's gene_association.sgd file

e. SGD curators recently concluded a large comparison study of manual vs computationally predicted annotations and are using the results from this to study to prioritize and come up with curation strategies.

f. SGD curators are routinely creating terms via the new TermGenie interface to speed up the process of annotation.

Presentations and Publications

  • Papers with substantial GO content

Dutkowski J, Kramer M, Surma MA, Balakrishnan R, Cherry JM, Krogan NJ, Ideker T. 2013. A gene ontology inferred from molecular networks. Nat Biotechnol. 2013 Jan;31(1):38-45. PMID: 23242164

Balakrishnan R, Harris MA, Huntley R, Van Auken K, Cherry JM. 2013. A guide to best practices for Gene Ontology (GO) manual annotation. Database (Oxford). 2013 Jul 9;2013:bat054. PMC3706743

Tripathi S, Christie KR, Balakrishnan R, Huntley R, Hill DP, Thommesen L, Blake JA, Kuiper M, Lægreid A. Gene Ontology annotation of sequence-specific DNA binding transcription factors: setting the stage for a large-scale curation effort. 2013. Database (Oxford). 2013 Aug 27;2013:bat062. PMC3753819


Other Highlights

A. Migration to protein2GO We migrated to using the protein2GO tool developed and maintained by UniProtKB for annotating GO data, instead of using our internal interface. This required exporting our existing annotations into the protein2GO database, complying with their QC checks, getting trained to use the protein2GO interface and then integrating the annotations back into SGD. This migration has allowed us to capture more specificity to GO data such as capturing substrate information for enzymes, regulation targets for certain processes, temporal aspects of a process. This roundtrip process took 8 months to complete.

B. Col-16 curation Migrating to protein2GO has provided us the tool to capture annotation extensions (aka col-16 data). We currently have about 2000 annotations (covering 160 proteins) with col-16 data.

B. Annotation Outreach

  • R. Balakrishnan is a manager for the Annotation Advocacy working group
  • R. Balakrishnan is part of the rotation that answers user email from gohelp.
  • SGD curators participate in Annotation conference calls and curation Jamboree.
  • R. Balakrishnan is part of a working group involved in the redesign of the GOC website.
  • R. Balakrishnan is part of the PAINT curation team
  • R. Balakrishnan is part of the AmiGO working group

B. GOMine

  • Kalpana Karra has built GOMine which is an implementation of InterMine. GOMine (http://gomine.geneontology.org) should serve as a fast search and retrieval tool for GO data without having to know sql or the database table structure. This tool has been live since beginning of 2013.

C. Software

  • SGD continues to serve as the Production server for the GO Database and for the AmiGO web application.
  • B. Hitz, S. Miyasato, G. Binkley are on the go-software mailing list