2012 Annotation Meeting Stanford: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(124 intermediate revisions by 13 users not shown)
Line 1: Line 1:
=Agenda=
=Agenda=
==Goals==
==Preparation/Goals for the meeting ==
GO annotations are the primary product of the GO and curator time is our most valuable resource: we need a defined process for how they will be produced efficiently and at high quality.  The primary objective of this meeting is to identify and define processes and procedures that will allow us to collectively work in a coordinated and efficient manner.
[[Over all goals for the meeting are available here ]]
* We will define new processes to identify and set annotation goals
** rate of production by annotation - # of annotations/quarter (ACTION ITEM: pick a realistic GOAL for increasing this # over the next 5 years)
** rate of production by domain - # of domain evaluations / quater (ACTION ITEM: pick a goal for first quarter and evaluate for future goals)
** rate of refinement - increase in information content/year (ACTION ITEM: pick a realistic GOAL for increasing this # over the next 5 years)
** quality standards and metrics, what is minimally required for community contributions (ACTION ITEM: publishing these requirements)
** what additional information shall we aim to collect to enrich the annotations coming from GO funded efforts (ACTION ITEM: List of new data types in relation to existing annotation)
* We will review and evaluate the current components of a process
** what are the current bottlenecks (ACTION ITEM: list of areas we need to address)
** what changes will we make in our process to eliminate bottlenecks and improve quality (ACTION ITEM: a prioritized list of these steps to work on over the next 3 months)
** what changes will we make to our data flow (ACTION ITEM: data flow diagram)


==Proposed Agenda==
==Group Photo==
[[Image: GOC-Stanford-2012.jpg|800px]]


==='''''Sunday afternoon: 1pm until 4:30pm'''''===
==Remote Attendees==
We will use the GO phone conference line and webex. An email will be sent to the GO consortium mailing list about how to join the Webex. <br>
Toll-free USA number 1-866-953-9688 (US Toll number 1-212-548-2460 in case of problems with 866 number)<br>
Toll-free UK 0808 238 6001 (toll number: 646 834-9311)<br>
Toll-free Switzerland 0800 562 830 (toll number: 646 834-9311)<br>


====Current Status of Annotation Production (Mike Cherry 20-30 minutes) i.e. Where are we now with basic EXP annotation? ====
Participant Pin: (801-561)<br>
* What is our current rate of new annotation production
* What is our current rate of annotation loss (due to sunset clause, average % of annotations that can't be loaded, etc.)
* Annotation Information Content assessment (how detailed are the existing annotations and what has been the trend over time)
* Completeness and adherence to standards of "gp2protein" files
=====Goal for this Session=====


====Outline of the current EXP-based annotation process. (Judy Blake 20-30 minutes)====
==Feb 26, 2012, Day 1 (half day) - Overview==
# Knowledge extraction (both into the ontology and new annotations)
<font size = "+1">Arrive at: 11:00AM<br>
#* Curators working directly from the literature
Meeting: 12:00PM</font>
#* Curators working with experts in the field (who summarize and provide links to the primary literature)
===Current GO annotation status===
# Knowledge capture
Minutes: Emily, Yasmin [[File:GOConsortiummeetingminutesFeb26_am.pdf‎]]
#* Ontology requests for change
#* MODS
#* others
# Quality Assurance
# Data flow
#* Current
#* Straw-man proposal
=====Goal for this Session=====


====Lessons learned from recent EXP-based curation efforts (Paul Sternberg >2 hours)====
* Overview of current pipeline experimental information submitted by annotation groups (Paul T & Mike) [[File: GOC_SurveyQ2.pdf]], [[File:GOCquestionnaire3.pdf‎]], [[File:GOCquestionnaire4.pdf‎]]
Focus is on how the processes might be generalized, with specific details only as supporting examples.
** Paul's summary slides [[File: survey_summary_3E_thru_4.pdf]]
* '''Domain-specific curation and ontology development (Knowledge extraction)'''
* Overview of rates of GO annotation production (break out metrics separately by group and by ontology aspect; for BP also report metrics separately for cellular process, multicellular organismal process; for CC also report metrics separately for macromolecular complex)
** apoptosis annotation (Paola Roncaglia and Emily Dimmer)
* Current annotation status and the trend over time will be presented here along with the proposed report page. E.g. What is our current rate of new annotation production, annotation information content assessment, current rate of annotation loss (due to sunset clause, average % of annotations that can't be loaded, etc.), annotation “completeness” (how many genes with annotations in all three aspects considering: EXP only; EXP+ISS; EXP+ISS+IEA; ND only)  (Amelia, Chris and Suzi)
** transcription overhaul (Karen Christie & Varsha Khodiyar)
* Completeness and adherence to standards of [http://wiki.geneontology.org/index.php/Gp2protein_file "gp2protein" files] (Paul T). <br>
* '''Approaches for recording annotations (Knowledge capture)'''
GPI format: (http://wiki.geneontology.org/index.php/Gene_Product_Association_Data_%28GPAD%29_Format#Proposed_Gene_Product_Information_.28GPI.29_file_format)
** Wiki-based annotation in CACAO: proposed improvements and potential generalizations (Jim Hu and Brenley)
** Integrating annotations coming from multiple sources for a single organism (Experiences at Swiss-Prot and GOA)
** CANTO experiences (Val Wood)
This is a PomBase tool that is being developed by Kim Rutherford to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)
* '''Discussion of annotation strategy'''
** What are the bottlenecks
=====Goal for this Session=====


==='''''Monday: 9am to 5pm'''''===
===Vision for GO annotation process===
Minutes: Rama, Karen
* Annotation submission (Suzi, Paul T., Chris and Emily) [[File:min_ideal_GO_annotation_requirements.pdf]]
** minimum annotation
** enhanced expressivity for GO annotations (Paul T.) [[File:enhanced_expressivity.pdf]]


====Towards a common annotation framework (Kimberly and Chris)====
Minutes: [[File:GOC2012MinutesSession2.pdf]]
* '''Kimberly to report on user requirements for CAF.'''
Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like. Therefore, by the GO Consortium meeting, Kimberly will be able to present the features that GOC curators feel are most important.
* '''Discussion'''
** any other aspects curators would require in an annotation tool.
** What additional data should be supplied by annotation groups
** How best to use text-mining in the CAF for prioritizing curation work (e.g. Textpresso)
* '''Project plan, development goals for next 3-6 months (Chris Mungall)'''


====Phylogenetic inference process (Paul Thomas and Pascale/Suzi)====
==Feb 27, 2012, Day 2 (full day)==
* '''PAINT as used for Quality Assurance'''
<font size = "+1">Breakfast: 8:00AM<br>
**Dual perspectives (biological topic focus)
Meeting: 9:00AM</font>
**cross-checking annotations
===Case studies of state-of-the-art annotation approaches===
**Phylogenetic inference: Synthesis, QA and inference across organisms using PAINT
[[minutes_annotation_approaches_stanford_2012 | Minutes]]: Jane, Paola<br>
* '''PAINT: How MODs can achieve full breadth of genome coverage'''
** Focused annotation session for ~10 GO annotators per group
** Led by Pascale and/or Paul with Mike L., Rama, Li Ni, Donghui, Huaiyu handling groups individually
** small groups to make each session manageable and productive
** Mixed groups
*** those with previous training in PAINT annotation
*** no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)
** Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)
** Time required: minimum: 5 hours.
=====Goal for this Session=====


==='''''Tuesday: 8:30am to 4pm'''''===
Each of these presentations will consider the experiences they have had and what the bottlenecks and issues that have been encountered. We will use these lessons to determine the best path forward to streamline and enrich the process.


====Minimal requirement for GO annotations (Paul Thomas & Emily Dimmer) i.e. What does an annotation consist of, both minimally and ideally ====
* Community approaches for recording annotations
* Minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
** [[Media:GO_201202_CACAO.pdf|Wiki-based annotation in CACAO: proposed improvements and potential generalizations]] (Jim Hu and Brenley McIntosh)
* What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
** CANTO experiences (Val Wood). CANTO is a PomBase tool that is being developed by Kim Rutherford to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)
* LEGO annotation framework
* What additional gene function information is used to supplement GO annotation?
** at UniProt (Claire)
** at MGI, e.g. cell types; temporal -spatial (David & Harold)[[File:MGI_GO_EI_2.pdf‎]]
* GOC Domain-specific curation and ontology development
** apoptosis annotation (Paola and Emily )
** transcription overhaul (Rama & Karen) [[Media:TxnOH-OntDev-Report.pdf]], [[Media:TxnOH-AnnotReport.pdf]]
** summary of recommendations (Mike)


====Towards a biological topic-focused approach====
===Proposal for GOC use of common annotation framework to support literature annotation and subsequent phylogenetic annotation process===
* '''How and who should select targeted gene sets'''
Minutes: Chris, Seth
** Is 9 genes a reasonable # to tackle per annotation milestone
* Overview of the common annotation framework (Chris and <b>Paul T.</b>)
** What criteria should be used to declare a milestone has been reached. (comprehensively annotated gene products, final paint approval, other QC checks)
** Data flow proposal. [[File: current_and_proposed_annotation_flow.pdf]]
* '''What skills are needed among the members of the annotation focus group and what tools do they need'''
* Details of the common annotation framework  (<b>Paul S</b>, Paul T, Judy, Mike, Suzi)
** Ontology expert, biological expertise, GO curator
* Common annotation Software prototype demos (Kimberly et al.).
** Efficient ways of leveraging the community
** Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like. By the GO Consortium meeting, Kimberly will present the features that GOC curators feel are most important.
** What tools and other infrastructure would assist.
*** Any other aspects curators would require in an annotation tool.
* '''Next steps towards integral quality control'''
*** What additional data should be supplied by annotation groups.
** Define our specific tactical approach to new annotation strategy
*** How best to use text-mining in the CAF for prioritizing curation work (e.g. Textpresso).
*** CAT Project development goals for next 3-6 months. (Kimberly & Chris Mungall)
*** PAINT as used for Quality Assurance: Dual perspectives (biological topic focus); cross-checking annotations; Phylogenetic inference: Synthesis, QA and inference across organisms using PAINT.
*** Integration and prioritization of phylogenetic annotations within the framework of experimental annotations.


====Responsibilities and review of milestones for next meeting====
==Feb 28, 2012, Day 3 (full day)==
<font size = "+1">Breakfast: 8:00AM<br>
Meeting: 9:00AM</font>


==Preparation needed in advance of GO Consortium meeting (incomplete, see agenda)==
===Concurrent sessions (4 hr)===
#Develop proposal for annotation process (Suzi, Rama, Emily, Kimberly)
* Software group will go off for a concurrent meeting (Chris, Seth, Ben, Mary, Heiko, Hans-Michael)
#*using examples from transcription overhaul (Karen & Varsha)
* '''Instructions to download and launch PAINT, and general user guide:''' http://wiki.geneontology.org/index.php/PAINT_User_Guide
#*apoptosis (Paola & Emily)
* PAINT training (Suzi and/or Huaiyu)
#*CACAO (Jim Hu)
** PAINT background and intro to function evolution in gene families (Paul T.) [[File:basic_PAINT_annotation_background.pdf]]
#*Integrating multiple sources for annotations on a single species (Rolf or Claire)
How MODs can achieve full breadth of genome coverage: Focused annotation session for ~5-7 GO annotators per group: Led by Pascale, Paul, & Huaiyu handling groups individually: small groups to make each session manageable and productive. Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)
#*Phylogenetic annotation (Pascale and/or Paul)
 
 
:* Mixed groups
:** those with previous training in PAINT annotation
:** no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)
:** Group 1: Pascale, Rama, Kimberly, - Prudence, Petra, Stacia, Dianna, Doug, Susan, Varsha, Aurore, Steven, Martha,
:** Group 2: Paul, Li, David, Tanya, - Julie, Karen, Cindy, Peter, Brenley, Paola, Ruth, Rex N, Lucas, Rajni
:** Group 3: Huaiyu, Donghui, Harold, Emily - Yasmin, Rob, Selina, Kalpana, Jim, Val, Jane, Carson, Diane
 
===Decisions made, reorganization of manager groups and their coordinators.===
 
Minutes: David, Kimberly, Harold
 
'''Minutes:'''[[File:Tuesday_02_28_Minutes.pdf]]
 
* Defined tasks that will occur over the next six months.
 
* The PIs will affirm that the Managers are responsible for communicating the tasks required to reach the projects agreed to goals.
** The Directors empower the Managers to activity monitor the working groups progress and report any problems that inhibit reaching our goals.
** There are commitments that need to be met by all members for the GOC to be a success.
 
* To conclude the meeting we will finalize goals and what we’ve agreed to do going forward. (GO Directors)
** Annotation and Ontology productivity working group
*** Pick a realistic goal for increasing the # of annotations/quarter over the next 5 years
*** Pick a realistic goal for increasing the information content (specificity) of the annotations over the next 5 years
*** Define and publish what is minimally required for community contributions and mechanisms to ensure quality standards are met. That is, minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
*** Efficient ways of leveraging the community
*** What tools and other infrastructure would assist.
*** Respond in a timely manner to community & sourceforge term requests
** Annotation Integration & ontology review working group
*** Quality Control: Independent secondary checks of annotations
**** Automated checks
**** Semi-manual checks
**** Manual checking through random sampling
**** What criteria should be used: comprehensively annotated gene products via final paint approval, other QC checks
*** How to select targeted gene sets
*** Define what additional information we ideally would aim to collect to enrich the annotations coming from GO funded efforts. That is, What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
*** Ontology modeling, consistency and review
** Framework working groups
*** Database infrastructure:
**** Define concrete steps for implementation of required infrastructure for meeting these annotation goals.
**** LEGO annotation framework
**** Sign off on the new annotation flow proposal.
**** Mechanism for sending GO annotations to the MODs. GO annotations will be generated by dedicated GO curators using the central CAT tool, and resultant
*** GO annotation tools working group
*** User and community interactions
** Ad hoc groups
*** Fixed timeframe
*** Fixed deliverable(s)  
** GO Directors (Paul T, Paul S, Suzi, Mike, & Judy
*** Decision making
*** Set priorities for each working groups
*** Attend working group calls
[[Category:Workshops]]

Latest revision as of 10:08, 15 April 2019

Agenda

Preparation/Goals for the meeting

Over all goals for the meeting are available here

Group Photo

Remote Attendees

We will use the GO phone conference line and webex. An email will be sent to the GO consortium mailing list about how to join the Webex.
Toll-free USA number 1-866-953-9688 (US Toll number 1-212-548-2460 in case of problems with 866 number)
Toll-free UK 0808 238 6001 (toll number: 646 834-9311)
Toll-free Switzerland 0800 562 830 (toll number: 646 834-9311)

Participant Pin: (801-561)

Feb 26, 2012, Day 1 (half day) - Overview

Arrive at: 11:00AM
Meeting: 12:00PM

Current GO annotation status

Minutes: Emily, Yasmin File:GOConsortiummeetingminutesFeb26 am.pdf

  • Overview of current pipeline experimental information submitted by annotation groups (Paul T & Mike) File:GOC SurveyQ2.pdf, File:GOCquestionnaire3.pdf, File:GOCquestionnaire4.pdf
  • Overview of rates of GO annotation production (break out metrics separately by group and by ontology aspect; for BP also report metrics separately for cellular process, multicellular organismal process; for CC also report metrics separately for macromolecular complex)
  • Current annotation status and the trend over time will be presented here along with the proposed report page. E.g. What is our current rate of new annotation production, annotation information content assessment, current rate of annotation loss (due to sunset clause, average % of annotations that can't be loaded, etc.), annotation “completeness” (how many genes with annotations in all three aspects considering: EXP only; EXP+ISS; EXP+ISS+IEA; ND only) (Amelia, Chris and Suzi)
  • Completeness and adherence to standards of "gp2protein" files (Paul T).

GPI format: (http://wiki.geneontology.org/index.php/Gene_Product_Association_Data_%28GPAD%29_Format#Proposed_Gene_Product_Information_.28GPI.29_file_format)

Vision for GO annotation process

Minutes: Rama, Karen

Minutes: File:GOC2012MinutesSession2.pdf

Feb 27, 2012, Day 2 (full day)

Breakfast: 8:00AM
Meeting: 9:00AM

Case studies of state-of-the-art annotation approaches

Minutes: Jane, Paola

Each of these presentations will consider the experiences they have had and what the bottlenecks and issues that have been encountered. We will use these lessons to determine the best path forward to streamline and enrich the process.

  • Community approaches for recording annotations
    • Wiki-based annotation in CACAO: proposed improvements and potential generalizations (Jim Hu and Brenley McIntosh)
    • CANTO experiences (Val Wood). CANTO is a PomBase tool that is being developed by Kim Rutherford to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)
  • What additional gene function information is used to supplement GO annotation?
  • GOC Domain-specific curation and ontology development

Proposal for GOC use of common annotation framework to support literature annotation and subsequent phylogenetic annotation process

Minutes: Chris, Seth

  • Overview of the common annotation framework (Chris and Paul T.)
  • Details of the common annotation framework (Paul S, Paul T, Judy, Mike, Suzi)
  • Common annotation Software prototype demos (Kimberly et al.).
    • Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like. By the GO Consortium meeting, Kimberly will present the features that GOC curators feel are most important.
      • Any other aspects curators would require in an annotation tool.
      • What additional data should be supplied by annotation groups.
      • How best to use text-mining in the CAF for prioritizing curation work (e.g. Textpresso).
      • CAT Project development goals for next 3-6 months. (Kimberly & Chris Mungall)
      • PAINT as used for Quality Assurance: Dual perspectives (biological topic focus); cross-checking annotations; Phylogenetic inference: Synthesis, QA and inference across organisms using PAINT.
      • Integration and prioritization of phylogenetic annotations within the framework of experimental annotations.

Feb 28, 2012, Day 3 (full day)

Breakfast: 8:00AM
Meeting: 9:00AM

Concurrent sessions (4 hr)

How MODs can achieve full breadth of genome coverage: Focused annotation session for ~5-7 GO annotators per group: Led by Pascale, Paul, & Huaiyu handling groups individually: small groups to make each session manageable and productive. Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)


  • Mixed groups
    • those with previous training in PAINT annotation
    • no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)
    • Group 1: Pascale, Rama, Kimberly, - Prudence, Petra, Stacia, Dianna, Doug, Susan, Varsha, Aurore, Steven, Martha,
    • Group 2: Paul, Li, David, Tanya, - Julie, Karen, Cindy, Peter, Brenley, Paola, Ruth, Rex N, Lucas, Rajni
    • Group 3: Huaiyu, Donghui, Harold, Emily - Yasmin, Rob, Selina, Kalpana, Jim, Val, Jane, Carson, Diane

Decisions made, reorganization of manager groups and their coordinators.

Minutes: David, Kimberly, Harold

Minutes:File:Tuesday 02 28 Minutes.pdf

  • Defined tasks that will occur over the next six months.
  • The PIs will affirm that the Managers are responsible for communicating the tasks required to reach the projects agreed to goals.
    • The Directors empower the Managers to activity monitor the working groups progress and report any problems that inhibit reaching our goals.
    • There are commitments that need to be met by all members for the GOC to be a success.
  • To conclude the meeting we will finalize goals and what we’ve agreed to do going forward. (GO Directors)
    • Annotation and Ontology productivity working group
      • Pick a realistic goal for increasing the # of annotations/quarter over the next 5 years
      • Pick a realistic goal for increasing the information content (specificity) of the annotations over the next 5 years
      • Define and publish what is minimally required for community contributions and mechanisms to ensure quality standards are met. That is, minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
      • Efficient ways of leveraging the community
      • What tools and other infrastructure would assist.
      • Respond in a timely manner to community & sourceforge term requests
    • Annotation Integration & ontology review working group
      • Quality Control: Independent secondary checks of annotations
        • Automated checks
        • Semi-manual checks
        • Manual checking through random sampling
        • What criteria should be used: comprehensively annotated gene products via final paint approval, other QC checks
      • How to select targeted gene sets
      • Define what additional information we ideally would aim to collect to enrich the annotations coming from GO funded efforts. That is, What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
      • Ontology modeling, consistency and review
    • Framework working groups
      • Database infrastructure:
        • Define concrete steps for implementation of required infrastructure for meeting these annotation goals.
        • LEGO annotation framework
        • Sign off on the new annotation flow proposal.
        • Mechanism for sending GO annotations to the MODs. GO annotations will be generated by dedicated GO curators using the central CAT tool, and resultant
      • GO annotation tools working group
      • User and community interactions
    • Ad hoc groups
      • Fixed timeframe
      • Fixed deliverable(s)
    • GO Directors (Paul T, Paul S, Suzi, Mike, & Judy
      • Decision making
      • Set priorities for each working groups
      • Attend working group calls