2012 Annotation Meeting Stanford

From GO Wiki
Revision as of 10:20, 21 December 2011 by Paul Thomas (talk | contribs) (Goals)

Jump to: navigation, search



GO annotations are the primary product of the GO and curator time is our most valuable resource: we need a defined process for how they will be produced efficiently and at high quality.

  • Overall objective: to briefly review the existing GO annotation streams, and delve into how we can make GO annotations better, faster.
  • We will define new processes to identify and set annotation goals by production numbers and by domain coverage.
    • rate of production by annotation: # of annotations/quarter (ACTION ITEM: pick a realistic GOAL for increasing this # over the next 5 years)
    • rate of production by domain: # of domain evaluations / quarter (ACTION ITEM: pick a goal for first quarter and evaluate for future goals)
    • rate of refinement: increase in information content/year (ACTION ITEM: pick a realistic GOAL for increasing this # over the next 5 years)
    • quality standards and metrics: minimal requirements for community contributions (ACTION ITEM: publishing these requirements)
    • what additional information shall we aim to collect to enrich the annotations coming from GO funded efforts (ACTION ITEM: List of new data types in relation to existing annotation)
  • We will review and evaluate the current components of a process
    • what are the current bottlenecks (ACTION ITEM: list of areas we need to address)
    • what changes will we make in our process to eliminate bottlenecks and improve quality (ACTION ITEM: a prioritized list of these steps to work on over the next 3 months)
    • what changes will we make to our data flow (ACTION ITEM: data flow diagram)

Proposed Agenda

Sunday afternoon: 1pm until 4:30pm

General Discussion of Integrated Annotation Pipeline (Mike Cherry 20-23 minutes)

  • Can we define data flow from publication to PAINT annotation?
  • Will, in general, each curator work from gene or from domain perspective?
  • Will in general, each curator complete chain from experimental data through phlyo-inference; or how will these processes flow from one to the other.
  • GO SWAT teams (PaulT and Judy)

Current Status of Annotation Production (20-30 minutes) i.e. Where are we now with basic EXP annotation?

  • What is our current rate of new annotation production
  • What is our current rate of annotation loss (due to sunset clause, average % of annotations that can't be loaded, etc.)
  • Annotation Information Content assessment (how detailed are the existing annotations and what has been the trend over time)
  • Completeness and adherence to standards of "gp2protein" files
Goal for this Session

Outline of the current EXP-based annotation process. (20-30 minutes)

  1. Knowledge extraction (both into the ontology and new annotations)
    • Curators working directly from the literature
    • Curators working with experts in the field (who summarize and provide links to the primary literature)
  2. Knowledge capture
    • Ontology requests for change
    • MODs
    • others
  3. Quality Assurance
  4. Data flow
    • Current
    • Straw-man proposal
Goal for this Session

Lessons learned from recent EXP-based curation efforts ( >2 hours)

Focus is on how the processes might be generalized, with specific details only as supporting examples.

  • Domain-specific curation and ontology development (Knowledge extraction)
    • apoptosis annotation (Paola Roncaglia and Emily Dimmer)
    • transcription overhaul (Karen Christie & Varsha Khodiyar)
  • Approaches for recording annotations (Knowledge capture)
    • Wiki-based annotation in CACAO: proposed improvements and potential generalizations (Jim Hu and Brenley)
    • Integrating annotations coming from multiple sources for a single organism (Experiences at Swiss-Prot and GOA)
    • CANTO experiences (Val Wood)

This is a PomBase tool that is being developed by Kim Rutherford to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)

  • Discussion of annotation strategy
    • What are the bottlenecks
Goal for this Session

Monday: 9am to 5pm

Towards a common annotation framework (Kimberly and Chris)

  • Kimberly to report on user requirements for CAF.

Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like. Therefore, by the GO Consortium meeting, Kimberly will be able to present the features that GOC curators feel are most important.

  • Discussion
    • any other aspects curators would require in an annotation tool.
    • What additional data should be supplied by annotation groups
    • How best to use text-mining in the CAF for prioritizing curation work (e.g. Textpresso)
  • Project plan, development goals for next 3-6 months (Chris Mungall)

Phylogenetic inference process (Paul Thomas and Pascale/Suzi)

  • PAINT as used for Quality Assurance
    • Dual perspectives (biological topic focus)
    • cross-checking annotations
    • Phylogenetic inference: Synthesis, QA and inference across organisms using PAINT
  • PAINT: How MODs can achieve full breadth of genome coverage
    • Focused annotation session for ~10 GO annotators per group
    • Led by Pascale and/or PaulT with Mike L., Rama, Li Ni, Donghui, Huaiyu handling groups individually
    • small groups to make each session manageable and productive
    • Mixed groups
      • those with previous training in PAINT annotation
      • no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)
    • Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)
    • Time required: minimum: 5 hours.
Goal for this Session

Tuesday: 8:30am to 4pm

Minimal requirement for GO annotations (Emily Dimmer and Harold Drabkin) i.e. What does an annotation consist of, both minimally and ideally

  • Minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
  • What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
  • LEGO annotation framework

Towards a biological topic-focused approach

  • How and who should select targeted gene sets
    • Is 9 genes a reasonable # to tackle per annotation milestone
    • What criteria should be used to declare a milestone has been reached. (comprehensively annotated gene products, final paint approval, other QC checks)
  • What skills are needed among the members of the annotation focus group and what tools do they need
    • Ontology expert, biological expertise, GO curator
    • Efficient ways of leveraging the community
    • What tools and other infrastructure would assist.
  • Next steps towards integral quality control
    • Define our specific tactical approach to new annotation strategy

Responsibilities and review of milestones for next meeting

Preparation needed in advance of GO Consortium meeting (incomplete, see agenda)

  1. Develop proposal for annotation process (Harold, Rama, Emily, Kimberly)
    • using examples from transcription overhaul (Karen & Varsha)
    • apoptosis (Paola & Emily)
    • CACAO (Jim Hu)
    • Integrating multiple sources for annotations on a single species (Rolf or Claire)
    • Phylogenetic annotation (Pascale and/or PaulT)
  2. Preparation for each GO annotation group
    • How does GO annotation fit into your overall curation process? Ideally as a high-level flowchart
    • What is your process for GO annotation? Ideally as a detailed flowchart
      • What software tools do you use for GO annotation?
      • Do you regularly make both literature and inferred (e.g. ISS) annotations?
      • How do you prioritize which papers, genes, etc. are targeted for GO annotation?
      • How do you create a GAF file for submission to the GOC?
    • What information do you want to capture in a controlled vocabulary that you currently CANNOT capture with GO terms?