2012 Annotation Meeting Stanford: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 20: Line 20:
* Completeness and adherence to standards of "gp2protein" files
* Completeness and adherence to standards of "gp2protein" files


===Discussion: Minimal requirement for GO annotations (Emily Dimmer) i.e. What does an annotation consist of, both minimally and ideally ===
===Discussion: Minimal requirement for GO annotations (Paul Thomas & Emily Dimmer) i.e. What does an annotation consist of, both minimally and ideally ===
* Minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
* Minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
* What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
* What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
* LEGO annotation framework


===Outline of the current EXP-based annotation process. (Suzi Lewis)===
===Outline of the current EXP-based annotation process. (Suzi Lewis)===

Revision as of 03:30, 17 December 2011

Agenda

Goals

GO annotations are the primary product of the GO: we need a defined process for how they will be produced efficiently and at high quality

  • To set new annotation goals
    • rate of production - # of annotations/quarter (ACTION ITEM: picking realistic GOALS for increasing this # over the next 5 years)
    • quality standards and metrics, what is minimally required (ACTION ITEM: publishing these requirements)
    • what additional information shall we aim to collect to enrich the annotations (ACTION ITEM: List of new data types in relation to existing annotation)
  • to review and evaluate the current components of a process
    • what are the current bottlenecks (ACTION ITEM: list of areas we will next address)
    • what changes will we make in our process to eliminate the current bottlenecks (ACTION ITEM: a prioritized list of these steps to work on over the next 3 months)

Proposed Agenda

Sunday afternoon: 1pm until 4:30pm

Current Status of Annotation Production (Mike Cherry) i.e. Where are we now with basic EXP annotation?

  • What is our current rate of new annotation production
  • What is our current rate of annotation loss (due to sunset clause, average % of annotations that can't be loaded, etc.)
  • Annotation Information Content assessment (how detailed are the existing annotations and what has been the trend over time)
  • Completeness and adherence to standards of "gp2protein" files

Discussion: Minimal requirement for GO annotations (Paul Thomas & Emily Dimmer) i.e. What does an annotation consist of, both minimally and ideally

  • Minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
  • What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
  • LEGO annotation framework

Outline of the current EXP-based annotation process. (Suzi Lewis)

  1. Knowledge extraction (both into the ontology and new annotations)
    • Curators working directly from the literature
    • Curators working with experts in the field (who summarize and provide links to the primary literature)
  2. Knowledge capture
    • Ontology requests for change
    • MODS
    • others
  3. Quality Assurance
  4. Data flow
    • Current
    • Straw-man proposal

Process and Lessons learned from current EXP-based curation efforts

Focus is on how the processes might be generalized, with specific details only as supporting examples.

  • Domain-specific curation and ontology development (Knowledge extraction)
    • apoptosis annotation (Paola Roncaglia and Emily Dimmer)
    • transcription overhaul (Karen Christie & Varsha Khodiyar)
  • Approaches for recording annotations
    • Wiki-based annotation in CACAO: proposed improvements and potential generalizations

** CANTO experiences (Val Wood) This is a PomBase tool that is being developed by Kim Rutherford to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)

  • Integral Quality Control
    • Dual perspectives and cross-checking
    • Phylogenetic inference: Synthesis, QA and inference across organisms using PAINT

Monday

Towards a biological topic-focused approach

  • How and who should select targeted gene sets
    • Is 9 genes a reasonable # of tackle per annotation milestone
    • What criteria should be used to declare a milestone has been reached. (comprehensively annotated gene products, final paint approval, QC checks)
  • What skills are needed among the members of the annotation focus group and what tools do they need
    • Ontology expert, biological expertise, GO curator
    • What tools would assist.

Towards a common annotation framework

* Kimberly to report on spec for CAF. Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like. Therefore, by the GO Consortium meeting, Kimberly will be able to present the features that GOC curators feel are most important.

    • any other aspects curators would require in an annotation tool.
    • What additional data should be supplied by annotation groups
    • How best to use text-mining in the CAF for prioritizing curation work (e.g. Textpresso)
    • Project plan, development goals for next 3 months

Phylogenetic inference process

* PAINT

  • Breadth of annotation: How can MODs achieve full genome coverage?

A focused annotation session for ~10 GO annotators (limit decided due to need for the session to be manageable and productive). Led by Pascale.

    • Annotators would be selected on the basis:

- as well as those with previous training in PAINT annotation (e.g. Mike L., Rama, Li Ni, Donghui) - no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)

    • Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)
    • Time required: minimum: 5 hours.

Tuesday

Evaluating efficiency

metrics discussion:

    • how best to measure annotation progress?
    • Possible stats: Count of new terms used in annotation? Count of comprehensively annotated gene products? Count of EXP-evidenced annotations, Count of species with new annotation sets? Count of new checks implemented?
    • what combination of stats would best reflect our curation efforts?
    • How can the selected set of metrics be most effectively created, what information do groups need to be ready to supply the GOC with?

Making annotation public

    • How do we enable intelligent consumption of GO and annotations, especially of new functionality/expressivity?



Preparation needed in advance of GO Consortium meeting

  1. Develop proposal for annotation process (Suzi, Rama, Emily, Kimberly)
    • using examples from transcription overhaul (Karen)
    • apoptosis (Paola)
    • CACAO (Jim Hu)
    • Phylogenetic annotation (Pascale)
  2. Develop proposal for metrics (GO-tops, managers)