GO Reference Genome Meeting
Sept 26-27th, 2007 Princeton, New Jersey
Wednesday, September 26, 2007
- Meeting Location: Frist Campus Center - Multipurpose Room C
- Lunch Meal Location: Frist Campus Center - Food Gallery
- Coffee/Tea/Water service for 30 people during the morning and afternoon sessions
- Lunch - $8.50/person on meal cards to be used in the Food Gallery
- Dinner - TBA
Thursday, September 27, 2007
- Meeting Location: Frist Campus Center - Multipurpose Room B
- Lunch Meal Location: Frist Campus Center - Food Gallery
- Coffee/Tea/Water service for 25 people during the morning and afternoon sessions
- Lunch - $8.50/person on meal cards to be used in the Food Gallery
- Dinner - TBA
High Level Topics Identified for Discussion
- Strategies to identify orthologs
- How to prioritize genes
- How to assess the progress made towards curation of reference genome genes; strategies for improvement
- Discussions regarding metrics, including making a plan for how to use metrics
- Review of progress toward database and tool development
- Annotation consistency discussion
- Outreach
Agenda
Morning of the 26th
Orthology
Moderator: Kara Dolinski
Resources:
- The Reference Genome groups have provided descriptions of their current methodology for establishing orthologs for their gene sets. These are available on the Orthology discussion page.
Discussion points:
- Is it necessary for all groups to use the same methodology in order to create orthology/homology data sets for the reference genome project?
- Standardize/describe procedure for identification across MODs
- How stringent and consistent do we want the language of 'orthology' and 'homology' to be incorporated into our documentation and discussion?
- How will we update our orthology sets with new genome builds or do we consider the current genomes in the reference genome project to be essentially complete?
- What is the impact of closely-related paralogs that have different functions on this project?
- Which model organisms are available in which databases, e.g. Dicty is not in Treefam; zebra fish & chicken are not in YOGY
- use-case examples (Kimberley wormbase, also Donghui?)
- Emily: GOA discussion about inheriting annotations
Papers of Interest:
- Alexeyenko, A., Lindberg, J., Perez-Bercoff, A., and Sonnhammer E.L.L. 2006. Overview and comparison of ortholog databases. Drug Discovery Today: Technologies 3:137-143.
- Dolinski, K. and Botstein, D. 2007. Orthology and Functional Conservation in Eukaryotes. Annu. Rev. Genet. 41:463-507.
- Hulsen, T., Huynen, M.A., de Vlieg, J., and Groenen, P.M.A. 2007. Benchmarking ortholog identification methods using functional genomics data. Genome Biology 7:R31.
- Wapinski, I., Pfeffer, A., Friedman, N., and Regew, A. 2007. Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics 23:1549-1558.
GRIN conference call
* GRIN (Genome Research Informatics Network) conference call tentatively 11 am
Afternoon of the 26th
* Priorities - Moderators Rex Chisholm and Pascale Gaudet * Methods - Moderator Suzi Lewis * Metrics - Moderator Mike Cherry * Tools - Moderator: Chris Mungall
Priorities
How to prioritize genes
- By Disease
- (Rex and Pascale): OMIM morbid map; also occasionally we find genes not in Morbid Map that have strong evidence for involvement in a disease
- There is an effort to cluster genes involved in the same disease or with the same or related function to facilitate the curatorial effort
- Questions: is there a more systematic way? should we target some diseases more specifically? What about multigene diseases ?
- Discuss pathways as an alternative method of prioritizing genes
Methods
Moderator: Suzi Lewis Expounders: Judy Blake and Rex Chisholm Discussion points:
- How to balance curation of experimental literature and ISS inference annotations work?
- How to balance prioritization of genes by importance to human disease processes and by presence in yeast and smaller organisms?
- How to measure 'comprehensiveness' of annotation and to know when sufficient curation of literature has occurred?
- How to prioritize new curation for already 'done' genes...for example, hot new papers that report new information about already 'completed' genes [Bmp4, Cav, for example, currently very 'hot']
Metrics
Moderator: Ruth Lovering
Metrics are required to measure own annotation progress. We will use both functional and structural information in these metrics.
- Karen: Structural sequence annotations by comparison of the GFF3 provided by the reference genome groups.
- Each reference genome must provide its sequence as GFF3 file. View table of the reference genome MODs GFF3
- Chris: Review of our progress to date by examining what is actually in the database
- Mike: Discussion of additional metrics and their consistent use
- Ruth had some thoughts on literature measures http://gocwiki.geneontology.org/index.php/Metrics:_breath_and_depth_of_annotations
Tools
Moderator: Chris Mungall
- Chris, Sohel and Mary are developing a web-based tool that will replace the current Google spreadsheet
- Demonstration of the tool (link to a page with the tool coming soon) Sohel's last version of the tool
- Curator input for further development
- Database
- AmiGO
Morning of the 27th
* annotation consistency * promotion of resource
Annotation Consistency
Moderator: Pascale Gaudet
Promotion of Resource
Moderator: Susan Tweedie
Discussion Points:
- Public view of reference genome project and annotations through GO website and AmiGO.
- Publication
- Other promotion efforts