Difference between revisions of "2012 Annotation Meeting Stanford"

From GO Wiki
Jump to: navigation, search
m (Lessons learned from recent EXP-based curation efforts ( >2 hours))
Line 15: Line 15:
 
#*Phylogenetic annotation (Pascale and/or PaulT)
 
#*Phylogenetic annotation (Pascale and/or PaulT)
  
==Goals==
+
==Overview of goals of meeting==
GO annotations are the primary product of the GO and curator time is our most valuable resource: we need a defined process for how they will be produced efficiently and at high quality.
+
At this meeting we will be discussing specific proposals with the prime objective being to reach agreement on an implementation path going forward to efficiently and accurately create functional annotations capturing molecular information about gene products that are of high quality. Ideally we will achieve the following:
* Overall objective: to briefly review the existing GO annotation streams, and delve into how we can make GO annotations better, faster.
 
* We will define new processes to identify and set annotation goals by production numbers and by domain coverage.
 
** rate of production by annotation:  # of annotations/quarter (ACTION ITEM: pick a realistic GOAL for increasing this # over the next 5 years)
 
** rate of production by domain:  # of domain evaluations / quarter (ACTION ITEM: pick a goal for first quarter and evaluate for future goals)
 
** rate of refinement:  increase in information content/year (ACTION ITEM: pick a realistic GOAL for increasing this # over the next 5 years)
 
** quality standards and metrics:  minimal requirements for community contributions (ACTION ITEM: publishing these requirements)
 
** what additional information shall we aim to collect to enrich the annotations coming from GO funded efforts (ACTION ITEM: List of new data types in relation to existing annotation)
 
* We will review and evaluate the current components of a process
 
** what are the current bottlenecks (ACTION ITEM: list of areas we need to address)
 
** what changes will we make in our process to eliminate bottlenecks and improve quality (ACTION ITEM: a prioritized list of these steps to work on over the next 3 months)
 
** what changes will we make to our data flow (ACTION ITEM: data flow diagram)
 
  
==Proposed Agenda==
+
#We will agree on specific annotation goals relative to the experimental literature. These #s will be reassessed at subsequent GOC meetings and used as a means of tracking progress.
 +
##Each annotation group will track completeness
 +
###Globally: Out of the potential GO annotations that could be made, what fraction are completed?  What fraction of all relevant papers (or genes, or however the group prioritizes GO annotation) have been curated?
 +
###By gene: Which genes are currently “comprehensive” wrt the available experimental literature?
 +
##The dedicated annotation staff will track adherence to annotation guidelines
 +
##The dedicated annotation staff will track the number of non-redundant, accepted annotations each period
 +
##The dedicated annotation staff will track the information content (specificity) of the accepted annotations
 +
#We will extend our quality control methods and strategies by creating a working group dedicated to this role.
 +
##Agreement that all contributed annotations will undergo secondary QA steps before being accepted
 +
##Agreement that dedicated QA curators have the authority to reject contributed annotations
 +
##Contributing curators agree to follow annotation guidelines and remove or revise annotations that do not meet these guidelines
 +
##The dedicated QA curators will agree to use and maintain the documented methodologies for inferred annotation
 +
##The dedicated QA curators will agree to use the prioritization scheme that is available on GO site.
 +
#We will define concrete steps for implementation of required infrastructure for meeting these annotation goals.
 +
##Contributing groups will sign off on the new annotation flow proposal and their ability and plans for implementation.
 +
##Sign off on the initial proposed requirements for the common annotation tool.
  
=== One idea for structuring meeting===
+
==Day 1 (half day) - Overview==
*Sunday Afternoon – The Vision: from experimental data to structured knowledge
+
===Current GO annotation status===
**1.Collective vision of idealized work flow (30 minutes open discussion: David Hill, discussion leader) 20 min
+
* Overview of current pipeline experimental information submitted by annotation groups (Paul T & Mike C)
**2.Overview of Annotations Now (Mike Cherry, discussion leader) 20 min
+
* Overview of rates of GO annotation production (break out metrics separately by group and by ontology aspect; for BP also report metrics separately for cellular process, multicellular organismal process; for CC also report metrics separately for macromolecular complex)
**3.Overview of Experimental Literature Curation now (Judy Blake discussion leader) 20 min
+
* Current annotation status and the trend over time will be presented here along with the proposed report page. E.g. What is our current rate of new annotation production, annotation information content assessment, current rate of annotation loss (due to sunset clause, average % of annotations that can't be loaded, etc.), annotation “completeness” (how many genes with annotations in all three aspects considering: EXP only; EXP+ISS; EXP+ISS+IEA; ND only) (Amelia, Chris and Suzi)
**4.Overview of synergies of curation and ontology development Now (Suzi Lewis discussion leader) 20 min
+
* Completeness and adherence to standards of "gp2protein" files (Tony or Eleanor? or Paul T?)
**5.Overview of Phylogenetic Annotations (Paul Thomas discussion leader) 20 min
 
**6 Overview of current curation tools and procedures (Paul Sternberg, discussion leader) 20 min
 
**7.Collective vision revisited (30 minutes open discussion: Rama Balakrishnan, discussion leader
 
  
*Monday Morning - What can we collectively do to make curation of experimental data more efficient?
+
===Vision for GO annotation process===
*Monday Afternoon – What can we collectively do to make phylogenetically based curation more efficient?
+
* Overall  management group reorganization and roles (Paul x 2, Judy, Mike, Suzi)
*Tuesday Morning A – What are our goals, how do we measure completeness, what is a minimal GO annotation?
+
* Annotation submission (Suzi, Paul T., Chris and Emily)
*Tuesday Morning B-workshops on 1) Common Annotation Tools – Kimberley and Val and 2) PAINT curation - Pascale
+
** minimum annotation
*Tuesday Afternoon-  Action Items for Curation Pipeline – building the data flow matrix together.
+
** ideal, “complete” annotation
 +
** current diff between ‘minimum’ and ‘complete’ and why
 +
* Overview of the common annotation framework (Chris and Paul T. - 20 minutes)
 +
** Data flow proposal.
  
 +
==Day 2 (full day)==
 +
===Case studies of state-of-the-art annotation approaches===
 +
Each of these presentations will consider the experiences they have had and what the bottlenecks and issues that have been encountered. We will use these lessons to determine the best path forward to streamline and enrich the process.
  
==='''''Sunday afternoon: 1pm until 4:30pm'''''===
+
* Community approaches for recording annotations
====GOC Vision - From experimental data to structured knowledge for biologists ====
+
** Wiki-based annotation in CACAO: proposed improvements and potential generalizations (Jim Hu and Brenley McIntosh)
* The hourglass model: gathering in of data, munging data, broad spectrum of output for consumption by biologists
+
** CANTO experiences (Val Wood). CANTO is a PomBase tool that is being developed by Kim Rutherford to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)
* Overview - Experimental Data In - Where does it come from? How can we as a community be more efficient?
+
** What additional gene function information is used to supplement GO annotation?
** Most experimental data currently comes from biomedical literature
+
*** at Swiss-Prot and GOA, e.g. UniPath (Claire & Rolf)
** Priorities can be either gene set or domain based
+
*** at MGI, e.g. cell types; temporal -spatial (David & Harold)
** Data could also come from data centers - pros and cons
+
** GOC Domain-specific curation and ontology development
** Define a minimal GO annotation (see notes from Emily)
+
*** apoptosis annotation (Paola Roncaglia and Emily Dimmer)
** GO curators should be able to annotate for genes from all organisms (different than MOD curators)and need support to do this.
+
*** transcription overhaul (Karen Christie & Varsha Khodiyar)
*PRIMARY MEETING ACTION ITEM: What can we collectively do to make curation of experimental data more efficient?
+
*** summary of recommendations (Paul and Suzi)
** not expected to answer now, but at the end of the meeting....needs to be held in mind throughout this meeting.
 
** Tracking experimental annotations, QC. metrics (further discussions later)
 
* Building on Experimental Annotations
 
** Defining data flow after deposit of experimental annotations
 
** How often can/should we update phylogenetic analysis? 
 
** How can we stream PAINT annotations effectively to consumers
 
** How can we measure completeness of annotation stream both by gene and by domain??
 
* Focus of Meeting
 
** Finding efficiencies in generating experimental annotations
 
** Exploring GO SWAT option where GOC staff oversee domain effort with distributed curation supported
 
** how can curators focused on experimental literature contribute to PAINT-like annotation stream
 
*PRIMARY MEETING ACTION ITEM: What is the measure of completeness by gene or by domain that we want to employ?
 
** in contrast to others, vertebrates have tens of members of the thousands of small gene families.
 
** in contract to others, vertebrates have hundreds of publications for primarily-studied set of ~10,000 genes.
 
**  
 
*PRMIARY MEETING ACTION ITEM: What do different sets of GO users want?  How do we provide this?
 
**'What does this gene do?'
 
** How do sets of genes coordinate actions?  in what spatial / temporal context?
 
** What do we know that is shared between experimental organisms vs. what is unique for a given organism or class of organisms?
 
  
====General Discussion of Integrated Annotation Pipeline (Mike Cherry 20-23 minutes)====
+
===Proposal for GOC use of common annotation framework to support literature annotation and subsequent phylogenetic annotation process===
* Can we define data flow from publication to PAINT annotation?
+
* Details of the common annotation framework  (Paul x 2, Judy, Mike, Suzi)
* Will, in general, each curator work from gene or from domain perspective?
+
* common annotation Software prototype demos (Kimberly et al.).
* Will in general, each curator complete chain from experimental data through phlyo-inference; or how will these processes flow from one to the other.
+
** Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like. By the GO Consortium meeting, Kimberly will present the features that GOC curators feel are most important.
* GO SWAT teams (PaulT and Judy)
+
*** any other aspects curators would require in an annotation tool.
 +
*** What additional data should be supplied by annotation groups
 +
*** How best to use text-mining in the CAF for prioritizing curation work (e.g. Textpresso)
 +
*** CAT Project development goals for next 3-6 months (Kimberly & Chris Mungall)
 +
*** PAINT as used for Quality Assurance: Dual perspectives (biological topic focus); cross-checking annotations; Phylogenetic inference: Synthesis, QA and inference across organisms using PAINT 
 +
*** Integration and prioritization of phylogenetic annotations within the framework of experimental annotations.
  
====Current Status of Annotation Production (20-30 minutes) i.e. Where are we now with basic EXP annotation? ====
+
==Day 3 (full day)==
* What is our current rate of new annotation production
+
===Concurrent sessions (4 hr)===
* What is our current rate of annotation loss (due to sunset clause, average % of annotations that can't be loaded, etc.)
+
* Software group will go off for a concurrent meeting (Chris, Seth, Ben, Mary, Heiko, Hans-Michael)
* Annotation Information Content assessment (how detailed are the existing annotations and what has been the trend over time)
+
* PAINT training (Suzi and/or Huaiyu)
* Completeness and adherence to standards of "gp2protein" files
+
How MODs can achieve full breadth of genome coverage: Focused annotation session for ~5-7 GO annotators per group: Led by Pascale, Paul, & Huaiyu handling groups individually: small groups to make each session manageable and productive. Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)
=====Goal for this Session=====
 
 
 
====Outline of the current EXP-based annotation process. (20-30 minutes)====
 
# Knowledge extraction (both into the ontology and new annotations)(David, Val)
 
#* Curators working directly from the literature
 
#* Curators working with experts in the field (who summarize and provide links to the primary literature)
 
# Knowledge capture
 
#* Ontology requests for change
 
#* MODs
 
#* others
 
# Quality Assurance
 
# Data flow
 
#* Current
 
#* Straw-man proposal
 
=====Goal for this Session=====
 
 
 
====Lessons learned from recent EXP-based curation efforts ( >2 hours)====
 
Focus is on how the processes might be generalized, with specific details only as supporting examples.
 
* '''Domain-specific curation and ontology development (Knowledge extraction)'''
 
** apoptosis annotation (Paola Roncaglia and Emily Dimmer)
 
** transcription overhaul (Karen Christie & Varsha Khodiyar)
 
* '''Approaches for recording annotations (Knowledge capture)'''
 
** Wiki-based annotation in CACAO: proposed improvements and potential generalizations (Jim Hu and Brenley)
 
** Integrating annotations coming from multiple sources for a single organism (Experiences at UniProt) (Emily Dimmer)
 
** CANTO experiences (Val Wood)
 
This is a PomBase tool that is being developed by Kim Rutherford to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)
 
* '''Discussion of annotation strategy'''
 
** What are the bottlenecks
 
=====Goal for this Session=====
 
 
 
==='''''Monday: 9am to 5pm'''''===
 
 
 
====Towards a common annotation framework (Kimberly and Chris)====
 
* '''Kimberly to report on user requirements for CAF.'''
 
Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like. Therefore, by the GO Consortium meeting, Kimberly will be able to present the features that GOC curators feel are most important.
 
* '''Discussion'''
 
** any other aspects curators would require in an annotation tool.
 
** What additional data should be supplied by annotation groups
 
** How best to use text-mining in the CAF for prioritizing curation work (e.g. Textpresso)
 
* '''Project plan, development goals for next 3-6 months (Chris Mungall)'''
 
 
 
====Phylogenetic inference process (Paul Thomas and Pascale/Suzi)====
 
* '''PAINT as used for Quality Assurance'''
 
**Dual perspectives (biological topic focus)
 
**cross-checking annotations
 
**Phylogenetic inference: Synthesis, QA and inference across organisms using PAINT
 
* '''PAINT: How MODs can achieve full breadth of genome coverage'''
 
** Focused annotation session for ~10 GO annotators per group
 
** Led by Pascale and/or PaulT with Mike L., Rama, Li Ni, Donghui, Huaiyu handling groups individually
 
** small groups to make each session manageable and productive
 
 
** Mixed groups
 
** Mixed groups
*** those with previous training in PAINT annotation  
+
*** those with previous training in PAINT annotation
 
*** no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)
 
*** no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)
** Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)
+
*** Group 1: Pascale, Rama, Kimberly, - Prudence, Petra, Stacia, Dianna, Doug, Susan, Varsha, Aurore, Steven, Martha,
** Time required: minimum: 5 hours.
+
*** Group 2: Paul, Li, David, Tanya, - Julie, Karen, Cindy, Peter, Brenley, Paola, Ruth, Rex N, Lucas, Rajni
=====Goal for this Session=====
+
*** Group 3: Huaiyu, Donghui, Harold, - Yasmin, Rob, Selina, Kalpana, Jim, Val, Jane, Emily, Carson, Diane
 +
 
 +
===Decisions made, reorganization of manager groups and their coordinators.===
 +
* Defined tasks that will occur over the next six months.
  
==='''''Tuesday: 8:30am to 4pm'''''===
+
* The PIs will affirm that the Managers are responsible for communicating the tasks required to reach the projects agreed to goals.
 +
** The PIs empower the Managers to activity monitor the working groups progress and report to the directors any problems that inhibit reaching our goals.  ** This is not a volunteer consortium and there are commitments that need to be met by all members for the GOC to be a success.
  
====Minimal requirement for GO annotations (Emily Dimmer, Harold Drabkin, Rama) i.e. What does an annotation consist of, both minimally and ideally ====
+
* To conclude the meeting we will finalize goals and what we’ve agreed to do going forward.
* Minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
+
** Annotation productivity working group (Rama & Emily)
* What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
+
*** Pick a realistic goal for increasing the # of annotations/quarter over the next 5 years
* LEGO annotation framework
+
*** Pick a realistic goal for increasing the information content (specificity) of the annotations over the next 5 years
 +
*** Define and publish what is minimally required for community contributions and mechanisms to ensure quality standards are met. That is, minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
 +
*** Define what additional information we ideally would aim to collect to enrich the annotations coming from GO funded efforts. That is, What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
 +
*** How to select targeted gene sets
 +
*** Efficient ways of leveraging the community
 +
*** What tools and other infrastructure would assist.
  
====Towards a biological topic-focused approach====
+
** Annotation & ontology review working group (David & Huaiyu)
* '''How and who should select targeted gene sets'''
+
*** Independent secondary checks of annotations
** Is 9 genes a reasonable # to tackle per annotation milestone
+
**** Automated checks
** What criteria should be used to declare a milestone has been reached. (comprehensively annotated gene products, final paint approval, other QC checks)
+
**** Semi-manual checks
* '''What skills are needed among the members of the annotation focus group and what tools do they need'''
+
**** Manual checking through random sampling
** Ontology expert, biological expertise, GO curator
+
**** What criteria should be used: comprehensively annotated gene products via final paint approval, other QC checks
** Efficient ways of leveraging the community
+
*** Ontology modeling, consistency and review
** What tools and other infrastructure would assist.
 
* '''Next steps towards integral quality control'''
 
** Define our specific tactical approach to new annotation strategy
 
  
====Responsibilities and review of milestones for next meeting====
+
** Framework working groups (Chris & Jane)
 +
*** Database infrastructure (Chris):
 +
**** Define concrete steps for implementation of required infrastructure for meeting these annotation goals.
 +
**** LEGO annotation framework
 +
**** Sign off on the new annotation flow proposal.
 +
**** Mechanism for sending GO annotations to the MODs. GO annotations will be generated by dedicated GO curators using the central CAT tool, and resultant
 +
*** GO annotation tools working group (Kimberly)
 +
*** User and community interactions (Jane)

Revision as of 10:04, 15 February 2012

Agenda

Preparation needed in advance of GO Consortium meeting

  1. Preparation for each GO annotation group
    • How does GO annotation fit into your overall curation process? Ideally as a high-level flowchart
    • What is your process for GO annotation? Ideally as a detailed flowchart
      • What software tools do you use for GO annotation?
      • Do you regularly make both literature and inferred (e.g. ISS) annotations?
      • How do you prioritize which papers, genes, etc. are targeted for GO annotation?
      • How do you create a GAF file for submission to the GOC?
    • What information do you want to capture in a controlled vocabulary that you currently CANNOT capture with GO terms?
  2. Develop proposal for annotation process (Suzi, Harold, Rama, Emily, Kimberly, Paul)
    • Literature-based annotation process using examples from transcription overhaul (Karen & Varsha) and apoptosis (Paola & Emily)
      • Ontology development support of annotation (David, Chris, Paola)
      • Prioritization, e.g. biological domain focus (Suzi, Emily, Rama)
    • Phylogenetic annotation (Pascale and/or PaulT)

Overview of goals of meeting

At this meeting we will be discussing specific proposals with the prime objective being to reach agreement on an implementation path going forward to efficiently and accurately create functional annotations capturing molecular information about gene products that are of high quality. Ideally we will achieve the following:

  1. We will agree on specific annotation goals relative to the experimental literature. These #s will be reassessed at subsequent GOC meetings and used as a means of tracking progress.
    1. Each annotation group will track completeness
      1. Globally: Out of the potential GO annotations that could be made, what fraction are completed? What fraction of all relevant papers (or genes, or however the group prioritizes GO annotation) have been curated?
      2. By gene: Which genes are currently “comprehensive” wrt the available experimental literature?
    2. The dedicated annotation staff will track adherence to annotation guidelines
    3. The dedicated annotation staff will track the number of non-redundant, accepted annotations each period
    4. The dedicated annotation staff will track the information content (specificity) of the accepted annotations
  2. We will extend our quality control methods and strategies by creating a working group dedicated to this role.
    1. Agreement that all contributed annotations will undergo secondary QA steps before being accepted
    2. Agreement that dedicated QA curators have the authority to reject contributed annotations
    3. Contributing curators agree to follow annotation guidelines and remove or revise annotations that do not meet these guidelines
    4. The dedicated QA curators will agree to use and maintain the documented methodologies for inferred annotation
    5. The dedicated QA curators will agree to use the prioritization scheme that is available on GO site.
  3. We will define concrete steps for implementation of required infrastructure for meeting these annotation goals.
    1. Contributing groups will sign off on the new annotation flow proposal and their ability and plans for implementation.
    2. Sign off on the initial proposed requirements for the common annotation tool.

Day 1 (half day) - Overview

Current GO annotation status

  • Overview of current pipeline experimental information submitted by annotation groups (Paul T & Mike C)
  • Overview of rates of GO annotation production (break out metrics separately by group and by ontology aspect; for BP also report metrics separately for cellular process, multicellular organismal process; for CC also report metrics separately for macromolecular complex)
  • Current annotation status and the trend over time will be presented here along with the proposed report page. E.g. What is our current rate of new annotation production, annotation information content assessment, current rate of annotation loss (due to sunset clause, average % of annotations that can't be loaded, etc.), annotation “completeness” (how many genes with annotations in all three aspects considering: EXP only; EXP+ISS; EXP+ISS+IEA; ND only) (Amelia, Chris and Suzi)
  • Completeness and adherence to standards of "gp2protein" files (Tony or Eleanor? or Paul T?)

Vision for GO annotation process

  • Overall management group reorganization and roles (Paul x 2, Judy, Mike, Suzi)
  • Annotation submission (Suzi, Paul T., Chris and Emily)
    • minimum annotation
    • ideal, “complete” annotation
    • current diff between ‘minimum’ and ‘complete’ and why
  • Overview of the common annotation framework (Chris and Paul T. - 20 minutes)
    • Data flow proposal.

Day 2 (full day)

Case studies of state-of-the-art annotation approaches

Each of these presentations will consider the experiences they have had and what the bottlenecks and issues that have been encountered. We will use these lessons to determine the best path forward to streamline and enrich the process.

  • Community approaches for recording annotations
    • Wiki-based annotation in CACAO: proposed improvements and potential generalizations (Jim Hu and Brenley McIntosh)
    • CANTO experiences (Val Wood). CANTO is a PomBase tool that is being developed by Kim Rutherford to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)
    • What additional gene function information is used to supplement GO annotation?
      • at Swiss-Prot and GOA, e.g. UniPath (Claire & Rolf)
      • at MGI, e.g. cell types; temporal -spatial (David & Harold)
    • GOC Domain-specific curation and ontology development
      • apoptosis annotation (Paola Roncaglia and Emily Dimmer)
      • transcription overhaul (Karen Christie & Varsha Khodiyar)
      • summary of recommendations (Paul and Suzi)

Proposal for GOC use of common annotation framework to support literature annotation and subsequent phylogenetic annotation process

  • Details of the common annotation framework (Paul x 2, Judy, Mike, Suzi)
  • common annotation Software prototype demos (Kimberly et al.).
    • Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like. By the GO Consortium meeting, Kimberly will present the features that GOC curators feel are most important.
      • any other aspects curators would require in an annotation tool.
      • What additional data should be supplied by annotation groups
      • How best to use text-mining in the CAF for prioritizing curation work (e.g. Textpresso)
      • CAT Project development goals for next 3-6 months (Kimberly & Chris Mungall)
      • PAINT as used for Quality Assurance: Dual perspectives (biological topic focus); cross-checking annotations; Phylogenetic inference: Synthesis, QA and inference across organisms using PAINT
      • Integration and prioritization of phylogenetic annotations within the framework of experimental annotations.

Day 3 (full day)

Concurrent sessions (4 hr)

  • Software group will go off for a concurrent meeting (Chris, Seth, Ben, Mary, Heiko, Hans-Michael)
  • PAINT training (Suzi and/or Huaiyu)

How MODs can achieve full breadth of genome coverage: Focused annotation session for ~5-7 GO annotators per group: Led by Pascale, Paul, & Huaiyu handling groups individually: small groups to make each session manageable and productive. Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)

    • Mixed groups
      • those with previous training in PAINT annotation
      • no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)
      • Group 1: Pascale, Rama, Kimberly, - Prudence, Petra, Stacia, Dianna, Doug, Susan, Varsha, Aurore, Steven, Martha,
      • Group 2: Paul, Li, David, Tanya, - Julie, Karen, Cindy, Peter, Brenley, Paola, Ruth, Rex N, Lucas, Rajni
      • Group 3: Huaiyu, Donghui, Harold, - Yasmin, Rob, Selina, Kalpana, Jim, Val, Jane, Emily, Carson, Diane

Decisions made, reorganization of manager groups and their coordinators.

  • Defined tasks that will occur over the next six months.
  • The PIs will affirm that the Managers are responsible for communicating the tasks required to reach the projects agreed to goals.
    • The PIs empower the Managers to activity monitor the working groups progress and report to the directors any problems that inhibit reaching our goals. ** This is not a volunteer consortium and there are commitments that need to be met by all members for the GOC to be a success.
  • To conclude the meeting we will finalize goals and what we’ve agreed to do going forward.
    • Annotation productivity working group (Rama & Emily)
      • Pick a realistic goal for increasing the # of annotations/quarter over the next 5 years
      • Pick a realistic goal for increasing the information content (specificity) of the annotations over the next 5 years
      • Define and publish what is minimally required for community contributions and mechanisms to ensure quality standards are met. That is, minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
      • Define what additional information we ideally would aim to collect to enrich the annotations coming from GO funded efforts. That is, What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
      • How to select targeted gene sets
      • Efficient ways of leveraging the community
      • What tools and other infrastructure would assist.
    • Annotation & ontology review working group (David & Huaiyu)
      • Independent secondary checks of annotations
        • Automated checks
        • Semi-manual checks
        • Manual checking through random sampling
        • What criteria should be used: comprehensively annotated gene products via final paint approval, other QC checks
      • Ontology modeling, consistency and review
    • Framework working groups (Chris & Jane)
      • Database infrastructure (Chris):
        • Define concrete steps for implementation of required infrastructure for meeting these annotation goals.
        • LEGO annotation framework
        • Sign off on the new annotation flow proposal.
        • Mechanism for sending GO annotations to the MODs. GO annotations will be generated by dedicated GO curators using the central CAT tool, and resultant
      • GO annotation tools working group (Kimberly)
      • User and community interactions (Jane)