Reference Genome Annotation Meeting: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 4: Line 4:
==Agenda==
==Agenda==


===Tentative Schedule===
----
===Karen E: Metrics===


*Metrics are required to measure own annotation progress. We will use both functional and structural information in these metrics.


*Karen: Structural sequence annotations by comparison of the GFF3 provided by the reference genome groups.
**Each reference genome must provide its sequence as GFF3 file. [http://wiki.geneontology.org/index.php/Reference_Genome_sequence_annotation View table of the reference genome MODs GFF3]


*Chris: Review of our progress to day by examining what is actually in the database


====Morning of the 27th ====
*Suzi: Discussion of additional metrics and their consistent use
  * annotation consistency
----
  * promotion of resource
 


===Strategies to identify orthologs===
===Judy: Strategies to identify orthologs===
* Procedures different databases are using can be found on the [[Orthology discussion page]]
* Procedures different databases are using can be found on the [[Orthology discussion page]]
* We'd like to have an expert explain the different tools: how the algorithms work, which is better (if any), what to do in case of disagreement between tools and how to manually find orthologs if the tools fail to give any results
* We'd like to have an expert explain the different tools: how the algorithms work, which is better (if any), what to do in case of disagreement between tools and how to manually find orthologs if the tools fail to give any results
Line 23: Line 26:
----
----


===How to prioritize disease genes===
===Michael: How to prioritize genes===
* Currently (Rex and Pascale): OMIM morbid map; also occasionally we find genes not in Morbid Map that have strong evidence for involvement in a disease
* Rex and Pascale: By Disease
* There is an effort to cluster genes involved in the same disease or with the same or  
**Currently (): OMIM morbid map; also occasionally we find genes not in Morbid Map that have strong evidence for involvement in a disease
related function to facilitate the curatorial effort
**There is an effort to cluster genes involved in the same disease or with the same or related function to facilitate the curatorial effort
* Questions: is there a more systematic way? should we target some diseases more specifically? What about multigene diseases ?
**Questions: is there a more systematic way? should we target some diseases more specifically? What about multigene diseases?
----
* Suzi: Discuss pathways as an alternative method of prioritizing genes
 
===How to assess the progress made towards curation of reference genome genes; strategies for improvement===


----
----
===Discussions regarding metrics, including making a plan for how to use metrics===


*Refresh our idea of what the metrics are for. Are they to measure own annotation progress, or to be used as a tool for outside users to understand annotations, or both?
===Chris: Review of progress toward database and tool development===
 
*Collectively we have a lot of data. Can we use statistics to help us understand what we have? For example, can we attach a p-value to the completeness of an annotation?
 
*Structural sequence annotations - what can we learn? Can we compare sequence annotations?
 
----
 
===Review of progress toward database and tool development===
* Chris, Sohel and Mary are developing a web-based tool that will replace the currant [http://dcn.spreadsheets.google.com/ccc?id=o16926456948884040128.4584390909151853752.07000735126025259412.442372083524637957 Google spreadsheet]
* Chris, Sohel and Mary are developing a web-based tool that will replace the currant [http://dcn.spreadsheets.google.com/ccc?id=o16926456948884040128.4584390909151853752.07000735126025259412.442372083524637957 Google spreadsheet]
* Demonstration of the tool (link to a page with the tool coming soon)
* Demonstration of the tool (link to a page with the tool coming soon)
* Curator input for further development
* Curator input for further development
----


* Integrating both functional and structural information into the metrics we develop. How are we going to integrate sequence into this pipeline?
===Pascale: Annotation consistency discussion===
* Each reference genome must provide its sequence as GFF3 file. [http://wiki.geneontology.org/index.php/Reference_Genome_sequence_annotation View table of the reference genome MODs GFF3]
*How to assess the progress made towards curation of reference genome genes
*strategies for improvement
----
----


===Annotation consistency discussion===
===Rex: Outreach possibilities===
 
* Write a paper describing the reference genome effort. Right now we have >200 genes annotated
----
* Contact NCBI to ask them to add reference genomes tags onto GenBank records
===Outreach===
* We should write a paper describing the reference genome effort. Right now we have >200 genes annotated
* We should contact NCBI to ask them to add reference genomes tags onto GenBank records

Revision as of 16:51, 20 September 2007

General Info

The first Reference Genome annotation Meeting will be held September 26-27, 2007 in Princeton, NJ, right after the GO consortium and GO advisors meeting.

Agenda


Karen E: Metrics

  • Metrics are required to measure own annotation progress. We will use both functional and structural information in these metrics.
  • Chris: Review of our progress to day by examining what is actually in the database
  • Suzi: Discussion of additional metrics and their consistent use

Judy: Strategies to identify orthologs

  • Procedures different databases are using can be found on the Orthology discussion page
  • We'd like to have an expert explain the different tools: how the algorithms work, which is better (if any), what to do in case of disagreement between tools and how to manually find orthologs if the tools fail to give any results
  • Standardize procedure for identification across MODs
  • Which model organisms are available in which databases, e.g. Dicty is not in Treefam; zebra fish & chicken are not in YOGY
  • use-case examples (Kimberley wormbase, also Donghui?)
  • Emily: GOA discussion about inheriting annotations

Michael: How to prioritize genes

  • Rex and Pascale: By Disease
    • Currently (): OMIM morbid map; also occasionally we find genes not in Morbid Map that have strong evidence for involvement in a disease
    • There is an effort to cluster genes involved in the same disease or with the same or related function to facilitate the curatorial effort
    • Questions: is there a more systematic way? should we target some diseases more specifically? What about multigene diseases?
  • Suzi: Discuss pathways as an alternative method of prioritizing genes

Chris: Review of progress toward database and tool development

  • Chris, Sohel and Mary are developing a web-based tool that will replace the currant Google spreadsheet
  • Demonstration of the tool (link to a page with the tool coming soon)
  • Curator input for further development

Pascale: Annotation consistency discussion

  • How to assess the progress made towards curation of reference genome genes
  • strategies for improvement

Rex: Outreach possibilities

  • Write a paper describing the reference genome effort. Right now we have >200 genes annotated
  • Contact NCBI to ask them to add reference genomes tags onto GenBank records