GO Reference Genome Meeting

Sept 26-27th, 2007 Princeton, New Jersey

Wednesday, September 26, 2007

Meeting Location: Frist Campus Center - Multipurpose Room C
Lunch Meal Location: Frist Campus Center - Food Gallery
Coffee/Tea/Water service for 30 people during the morning and afternoon sessions
Lunch - $8.50/person on meal cards to be used in the Food Gallery
Dinner - TBA

Thursday, September 27, 2007

Meeting Location: Frist Campus Center - Multipurpose Room B
Lunch Meal Location: Frist Campus Center - Food Gallery
Coffee/Tea/Water service for 25 people during the morning and afternoon sessions
Lunch - $8.50/person on meal cards to be used in the Food Gallery
Dinner - TBA

High Level Topics Identified for Discussion

Strategies to identify orthologs
How to prioritize genes
How to assess the progress made towards curation of reference genome genes; strategies for improvement
Discussions regarding metrics, including making a plan for how to use metrics
Review of progress toward database and tool development
Annotation consistency discussion
Outreach

Agenda

Morning of the 26th

Orthology

Moderator: Kara Dolinski

Resources:

The Reference Genome groups have provided descriptions of their current methodology for establishing orthologs for their gene sets. These are available on the Orthology discussion page.

Discussion points:

Is it necessary for all groups to use the same methodology in order to create orthology/homology data sets for the reference genome project?
Standardize/describe procedure for identification across MODs
How stringent and consistent do we want the language of 'orthology' and 'homology' to be incorporated into our documentation and discussion?
How will we update our orthology sets with new genome builds or do we consider the current genomes in the reference genome project to be essentially complete?
What is the impact of closely-related paralogs that have different functions on this project?
Which model organisms are available in which databases, e.g. Dicty is not in Treefam; zebra fish & chicken are not in YOGY
use-case examples (Kimberley wormbase, also Donghui?)
Emily: GOA discussion about inheriting annotations

Papers of Interest:

Alexeyenko, A., Lindberg, J., Perez-Bercoff, A., and Sonnhammer E.L.L. 2006. Overview and comparison of ortholog databases. Drug Discovery Today: Technologies 3:137-143.
Dolinski, K. and Botstein, D. 2007. Orthology and Functional Conservation in Eukaryotes. Annu. Rev. Genet. 41:463-507.
Hulsen, T., Huynen, M.A., de Vlieg, J., and Groenen, P.M.A. 2007. Benchmarking ortholog identification methods using functional genomics data. Genome Biology 7:R31.
Wapinski, I., Pfeffer, A., Friedman, N., and Regew, A. 2007. Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics 23:1549-1558.

GRIN conference call

  * GRIN (Genome Research Informatics Network) conference call tentatively 11 am

Afternoon of the 26th

  * Metrics - Moderator Michael Ashburner
  * Priorities - Moderators Rex Chisholm and Pascale Gaudet
  * Methods - Moderator Suzi Lewis
  * Tools - Moderator: Chris Mungall

Metrics

Moderator: Michael Ashburner

Metrics are required to measure own annotation progress. We will use both functional and structural information in these metrics.

Karen: Structural sequence annotations by comparison of the GFF3 provided by the reference genome groups.
- Each reference genome must provide its sequence as GFF3 file. View table of the reference genome MODs GFF3
- Plan for metric resource GO_Reference_Genome_Meeting_Metric_Plan

Chris: Review of our progress to date by examining what is actually in the database

Mike: Discussion of additional metrics and their consistent use

Ruth had some thoughts on literature measures Metrics:_breath_and_depth_of_annotations

Priorities

How to prioritize genes

By Disease
- (Rex and Pascale): OMIM morbid map; also occasionally we find genes not in Morbid Map that have strong evidence for involvement in a disease
- There is an effort to cluster genes involved in the same disease or with the same or related function to facilitate the curatorial effort
- Questions: is there a more systematic way? should we target some diseases more specifically? What about multigene diseases ?
Discuss pathways as an alternative method of prioritizing genes

Methods

Moderator: Suzi Lewis Expounders: Judy Blake and Rex Chisholm Discussion points:

How to balance curation of experimental literature and ISS inference annotations work?
How to balance prioritization of genes by importance to human disease processes and by presence in yeast and smaller organisms?
How to measure 'comprehensiveness' of annotation and to know when sufficient curation of literature has occurred?
How to prioritize new curation for already 'done' genes...for example, hot new papers that report new information about already 'completed' genes [Bmp4, Cav, for example, currently very 'hot']

Tools

Moderator: Chris Mungall

Chris, Sohel and Mary are developing a web-based tool that will replace the current Google spreadsheet
Demonstration of the tool (link to a page with the tool coming soon) Sohel's last version of the tool
Curator input for further development
Database
AmiGO

Morning of the 27th

  * annotation consistency
  * promotion of resource

Annotation Consistency

Moderator: Pascale Gaudet

Promotion of Resource

Moderator: Susan Tweedie

Discussion Points:

Public view of reference genome project and annotations through GO website and AmiGO.
Publication
Other promotion efforts