2012 Annotation Meeting Stanford: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(201 intermediate revisions by 13 users not shown)
Line 1: Line 1:
==Agenda==
=Agenda=
===Goals===
==Preparation/Goals for the meeting ==
* Intent is not to go over annotation specifics (how to annotate, what evidence etc, all of which can be done well over annotation conf. calls). Should focus on Annotation as a process and not the thing. How to facilitate the process of annotation?
[[Over all goals for the meeting are available here ]]
* Should enable developers to gather ideas for the CAF
* Possibly open protein2GO or the Pombase community annotation tool to curators so they can give feedback on what works and what doesn't
* Pick a subprocess as the theme and see the evolution of the ontology and annotations related to the subprocess
* dedicated session on PAINT training (hands on training) to propagate annotations made on the subprocess
* Possibly develop a model for annotating complexes and build on col-16 curation


==Group Photo==
[[Image: GOC-Stanford-2012.jpg|800px]]


===Ideas from PIs===
==Remote Attendees==
TITLE: Supporting Curation of Annotations in the GOC
We will use the GO phone conference line and webex. An email will be sent to the GO consortium mailing list about how to join the Webex. <br>
Toll-free USA number 1-866-953-9688 (US Toll number 1-212-548-2460 in case of problems with 866 number)<br>
Toll-free UK 0808 238 6001 (toll number: 646 834-9311)<br>
Toll-free Switzerland 0800 562 830 (toll number: 646 834-9311)<br>


1. Common Annotation Tool; Streaming annotations into AmiGO
Participant Pin: (801-561)<br>
*Establishing requirements and their priorities (Kimberly)
*software implementation strategy (Chris)


2. Process-focused Annotation Approach
==Feb 26, 2012, Day 1 (half day) - Overview==
*incorporating experiment, ontology deve, domain experts, phylogenetic inferencing
<font size = "+1">Arrive at: 11:00AM<br>
*using transcription and sub-process of apoptosis
Meeting: 12:00PM</font>
===Current GO annotation status===
Minutes: Emily, Yasmin [[File:GOConsortiummeetingminutesFeb26_am.pdf‎]]


3. Resolving Perennial Annotation Issues ( one or more) (what are they?  list !!)
* Overview of current pipeline experimental information submitted by annotation groups (Paul T & Mike) [[File: GOC_SurveyQ2.pdf]], [[File:GOCquestionnaire3.pdf‎]], [[File:GOCquestionnaire4.pdf‎]]
*pre-developed proposal essential
** Paul's summary slides [[File: survey_summary_3E_thru_4.pdf]]
*decision making process has to be clarified
* Overview of rates of GO annotation production (break out metrics separately by group and by ontology aspect; for BP also report metrics separately for cellular process, multicellular organismal process; for CC also report metrics separately for macromolecular complex)
*need to have input from parties
* Current annotation status and the trend over time will be presented here along with the proposed report page. E.g. What is our current rate of new annotation production, annotation information content assessment, current rate of annotation loss (due to sunset clause, average % of annotations that can't be loaded, etc.), annotation “completeness” (how many genes with annotations in all three aspects considering: EXP only; EXP+ISS; EXP+ISS+IEA; ND only)  (Amelia, Chris and Suzi)
* Completeness and adherence to standards of [http://wiki.geneontology.org/index.php/Gp2protein_file "gp2protein" files] (Paul T). <br>
GPI format: (http://wiki.geneontology.org/index.php/Gene_Product_Association_Data_%28GPAD%29_Format#Proposed_Gene_Product_Information_.28GPI.29_file_format)


4. Metrics: Quality and Completeness of Annotations
===Vision for GO annotation process===
*quality control of annotation streams
Minutes: Rama, Karen
*capturing evidence and source
* Annotation submission (Suzi, Paul T., Chris and Emily) [[File:min_ideal_GO_annotation_requirements.pdf]]
*help for users in using annotations correctly
** minimum annotation
*ranking literature for curation
** enhanced expressivity for GO annotations (Paul T.) [[File:enhanced_expressivity.pdf]]
*evaluating annotation 'currency'


Minutes: [[File:GOC2012MinutesSession2.pdf]]


----
==Feb 27, 2012, Day 2 (full day)==
=Proposed Agenda=
<font size = "+1">Breakfast: 8:00AM<br>
Meeting: 9:00AM</font>
===Case studies of state-of-the-art annotation approaches===
[[minutes_annotation_approaches_stanford_2012 | Minutes]]: Jane, Paola<br>


Each of these presentations will consider the experiences they have had and what the bottlenecks and issues that have been encountered. We will use these lessons to determine the best path forward to streamline and enrich the process.


==Day 1 "How do we become an Efficient Annotation Factory?"==
* Community approaches for recording annotations
** [[Media:GO_201202_CACAO.pdf|Wiki-based annotation in CACAO: proposed improvements and potential generalizations]] (Jim Hu and Brenley McIntosh)
** CANTO experiences (Val Wood). CANTO is a PomBase tool that is being developed by Kim Rutherford to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)
* What additional gene function information is used to supplement GO annotation?
** at UniProt (Claire)
** at MGI, e.g. cell types; temporal -spatial (David & Harold)[[File:MGI_GO_EI_2.pdf‎]]
* GOC Domain-specific curation and ontology development
** apoptosis annotation (Paola and Emily )
** transcription overhaul (Rama & Karen) [[Media:TxnOH-OntDev-Report.pdf]], [[Media:TxnOH-AnnotReport.pdf]]
** summary of recommendations (Mike)


* Goals and expectations for the next GO NIH grant [PI presentation]
===Proposal for GOC use of common annotation framework to support literature annotation and subsequent phylogenetic annotation process===
Minutes: Chris, Seth
* Overview of the common annotation framework (Chris and <b>Paul T.</b>)
** Data flow proposal. [[File: current_and_proposed_annotation_flow.pdf]]
* Details of the common annotation framework  (<b>Paul S</b>, Paul T, Judy, Mike, Suzi)
* Common annotation Software prototype demos (Kimberly et al.).
** Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like. By the GO Consortium meeting, Kimberly will present the features that GOC curators feel are most important.
*** Any other aspects curators would require in an annotation tool.
*** What additional data should be supplied by annotation groups.
*** How best to use text-mining in the CAF for prioritizing curation work (e.g. Textpresso).
*** CAT Project development goals for next 3-6 months. (Kimberly & Chris Mungall)
*** PAINT as used for Quality Assurance: Dual perspectives (biological topic focus); cross-checking annotations; Phylogenetic inference: Synthesis, QA and inference across organisms using PAINT.
*** Integration and prioritization of phylogenetic annotations within the framework of experimental annotations.


** Change in paradigm - rather than groups contributing to GOC, groups will take annotations from GOC (all annotations go into a central GO database and then groups take annotations for their taxonIDS).
==Feb 28, 2012, Day 3 (full day)==
<font size = "+1">Breakfast: 8:00AM<br>
Meeting: 9:00AM</font>


===Discussion: Top Priorities for Improving the GO annotation set.===
===Concurrent sessions (4 hr)===
* Software group will go off for a concurrent meeting (Chris, Seth, Ben, Mary, Heiko, Hans-Michael)
* '''Instructions to download and launch PAINT, and general user guide:''' http://wiki.geneontology.org/index.php/PAINT_User_Guide
* PAINT training (Suzi and/or Huaiyu)
** PAINT background and intro to function evolution in gene families (Paul T.) [[File:basic_PAINT_annotation_background.pdf]]
How MODs can achieve full breadth of genome coverage: Focused annotation session for ~5-7 GO annotators per group: Led by Pascale, Paul, & Huaiyu handling groups individually: small groups to make each session manageable and productive. Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)


** The essential data needed from annotation groups.


** Minimal frequency of data provision
:* Mixed groups
:** those with previous training in PAINT annotation
:** no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)
:** Group 1: Pascale, Rama, Kimberly, - Prudence, Petra, Stacia, Dianna, Doug, Susan, Varsha, Aurore, Steven, Martha,
:** Group 2: Paul, Li, David, Tanya, - Julie, Karen, Cindy, Peter, Brenley, Paola, Ruth, Rex N, Lucas, Rajni
:** Group 3: Huaiyu, Donghui, Harold, Emily - Yasmin, Rob, Selina, Kalpana, Jim, Val, Jane, Carson, Diane


**How the GOC can better support curators with under-powered annotation tools before CAF becomes available?
===Decisions made, reorganization of manager groups and their coordinators.===


=== Discussion: Improving the Annotation Process===
Minutes: David, Kimberly, Harold


**  efficient in terms of ontology development, PAINT inferencing etc.
'''Minutes:'''[[File:Tuesday_02_28_Minutes.pdf]]


** what sub-components are needed to make the process efficient? For example-how do we facilitate identifying literature? are there tools (texptpresso)?
* Defined tasks that will occur over the next six months.


===Making the GO annotation tool of the future===
* The PIs will affirm that the Managers are responsible for communicating the tasks required to reach the projects agreed to goals.
** The Directors empower the Managers to activity monitor the working groups progress and report any problems that inhibit reaching our goals.
'''* Kimberly to report on spec for CAF.'''
** There are commitments that need to be met by all members for the GOC to be a success.


* Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like.
* To conclude the meeting we will finalize goals and what we’ve agreed to do going forward. (GO Directors)
 
** Annotation and Ontology productivity working group
* Therefore, by the GO Consortium meeting, Kimberly will be able to present the features that GOC curators feel are most important.
*** Pick a realistic goal for increasing the # of annotations/quarter over the next 5 years
 
*** Pick a realistic goal for increasing the information content (specificity) of the annotations over the next 5 years
*Subsequent discussion on:
*** Define and publish what is minimally required for community contributions and mechanisms to ensure quality standards are met. That is, minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
 
*** Efficient ways of leveraging the community
** any other aspects curators would require in an annotation tool.
*** What tools and other infrastructure would assist.
 
*** Respond in a timely manner to community & sourceforge term requests
** What additional data should be supplied by annotation groups
** Annotation Integration & ontology review working group
 
*** Quality Control: Independent secondary checks of annotations
** How best to use textmining in the CAF for prioritizing curation work (e.g. Textpresso)
**** Automated checks
 
**** Semi-manual checks
'''* Val to report on the Community Annotation Tool (PomBase)'''
**** Manual checking through random sampling
 
**** What criteria should be used: comprehensively annotated gene products via final paint approval, other QC checks
** This is a PomBase tool that is being developed by Kim to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)
*** How to select targeted gene sets
 
*** Define what additional information we ideally would aim to collect to enrich the annotations coming from GO funded efforts. That is, What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
** Discussion on how best to advertise tool to community and how to manage annotation submissions within the Consortium.
*** Ontology modeling, consistency and review
 
** Framework working groups
===1  Coordinated Annotation, Ontology and Software Development in the GOC===
*** Database infrastructure:
 
**** Define concrete steps for implementation of required infrastructure for meeting these annotation goals.
*Goals and Lessons learnt from previous efforts to coordinate ontology and annotation efforts
**** LEGO annotation framework
 
**** Sign off on the new annotation flow proposal.
*Specific Example: interfacing the with ontology development, topics arising from the current Apoptosis annotation effort. (Paola and Emily?)'''
**** Mechanism for sending GO annotations to the MODs. GO annotations will be generated by dedicated GO curators using the central CAT tool, and resultant
 
*** GO annotation tools working group
==Specific developments to improve the current GO annotation format==
*** User and community interactions
 
** Ad hoc groups
'''Improving the information content'''
*** Fixed timeframe
 
*** Fixed deliverable(s)  
1. Proposal for the definition/documentation of the default gene product to GO term relationships (Chris?)'''
** GO Directors (Paul T, Paul S, Suzi, Mike, & Judy
*** Decision making
2. GO Annotation above and below the gene product: Developing the annotation format for Protein Complexes
*** Set priorities for each working groups
** Rama, Harold and Emily to present guidance on how protein complex identifiers could be annotated with GO terms
*** Attend working group calls
** Proposal for redefinition of the ‘contributes_to’ qualifier so that it can be used consistently by all groups
[[Category:Workshops]]
** Outcome of pilot project for annotating to GO protein complex ids using the ‘integral_to’ qualifier. (UniProtKB, SGD, MGI?)
 
3. Data represented in the Annotation Extension Field (column 16). An increasingly important part of the annotation format.
** work to develop guidelines, QC checks, Relationship ontology developments.
** Guideline proposal from GO Ontology group for when curators should use column 16 instead of making a GO Term request
** Discussion of appropriate display of column 16 data in GO browsers, e.g. display as column 16 – or interpret as an extension of the GO term?
===Improving GO annotation consistency through ontology development===
 
'''* protein binding (Jane Lomax)'''
 
** Proposal to be presented on new guidance for terms describing binding.
** Focusing on the importance of keeping functional information in this node that is seen as important by users/curators but also with the aim of improving annotation consistency.
 
===3.  Transferring annotations to non-Model Organisms===
 
'''* PAINT'''
 
A focused annotation session for ~10 GO annotators (limit decided due to need for the session to be manageable and productive). Led by Pascale.
** Annotators would be selected on the basis:
 
- previous training in PAINT annotation (e.g. Mike L., Rama, Li Ni, Donghui)
- no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)
 
** Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)
 
** Time required: minimum: 5 hours.
 
 
==='''5.  Making annotation public'''===
 
'''* Summary of new annotation QCs agreed.'''
 
** Discussion: Any contentious annotation QCs that need to be discussed further
 
** Resolution of GO annotation filtering by species on the GOC site, progress since last GOC meeting
 
** Development of a GPAD annotation file directory and the ECO resource (action items from last GOC meeting)
 
==='''6. Evaluating efficiency'''===
 
'''Process Discussions:'''
 
** can we use Jim Hu's students for some peripheral curation
** How to respond best to community annotation requests
 
'''metrics discussion:'''
 
** how best to measure annotation progress?
 
** Possible stats: Count of new terms used in annotation? Count of comprehensively annotated gene products? Count of EXP-evidenced annotations, Count of species with new annotation sets? Count of new checks implemented?
 
** what combination of stats would best reflect our curation efforts?
 
** How can the selected set of metrics be most effectively created, what information do groups need to be ready to supply the GOC with?
 
 
----
 
==Preparation needed in advance of GO Consortium meeting==
 
'''GO annotation calls.'''
 
There are only 5 annotation calls scheduled before the GOC meeting. Therefore we need to use this time wisely. If we have one major topic per call, perhaps it could be:
 
1. Annotation to ‘contributes_to’
 
2. Default gp-GO term Relationship definitions
 
3. Protein Binding
 
4. Column 16
 
5. Focused apoptosis annotation discussion.
 
...with a regular slot for a couple of QC checks, to get the uncontentious ones agreed upon and if possible, implemented, before GOC meeting,
 
'''Work outside of the GO annotation calls to be discussed on GO list?'''
 
* ISS/IC issue brought up by Ruth at the GOC meeting. A proposal almost ready to be emailed to GO list (Emily, Ruth)
 
* Column 17 concerns; developing the GAF spec; from recent emails by Amelia (Amelia, Mike, Chris)
 
* Documentation for GPAD format, creation of regularly updated directory on GOC site using (Tony, Chris, Amelia)
 
* Resolving annotation filtering on GOC site where groups responsible for  a species are not (Mike, UniProt-GOA)
 
* IGI and ‘with’ field? (new item raised by SGD)
 
* Documentation that needs to be created to support wider use of IKR (Emily, UniProt-GOA)

Latest revision as of 10:08, 15 April 2019

Agenda

Preparation/Goals for the meeting

Over all goals for the meeting are available here

Group Photo

Remote Attendees

We will use the GO phone conference line and webex. An email will be sent to the GO consortium mailing list about how to join the Webex.
Toll-free USA number 1-866-953-9688 (US Toll number 1-212-548-2460 in case of problems with 866 number)
Toll-free UK 0808 238 6001 (toll number: 646 834-9311)
Toll-free Switzerland 0800 562 830 (toll number: 646 834-9311)

Participant Pin: (801-561)

Feb 26, 2012, Day 1 (half day) - Overview

Arrive at: 11:00AM
Meeting: 12:00PM

Current GO annotation status

Minutes: Emily, Yasmin File:GOConsortiummeetingminutesFeb26 am.pdf

  • Overview of current pipeline experimental information submitted by annotation groups (Paul T & Mike) File:GOC SurveyQ2.pdf, File:GOCquestionnaire3.pdf, File:GOCquestionnaire4.pdf
  • Overview of rates of GO annotation production (break out metrics separately by group and by ontology aspect; for BP also report metrics separately for cellular process, multicellular organismal process; for CC also report metrics separately for macromolecular complex)
  • Current annotation status and the trend over time will be presented here along with the proposed report page. E.g. What is our current rate of new annotation production, annotation information content assessment, current rate of annotation loss (due to sunset clause, average % of annotations that can't be loaded, etc.), annotation “completeness” (how many genes with annotations in all three aspects considering: EXP only; EXP+ISS; EXP+ISS+IEA; ND only) (Amelia, Chris and Suzi)
  • Completeness and adherence to standards of "gp2protein" files (Paul T).

GPI format: (http://wiki.geneontology.org/index.php/Gene_Product_Association_Data_%28GPAD%29_Format#Proposed_Gene_Product_Information_.28GPI.29_file_format)

Vision for GO annotation process

Minutes: Rama, Karen

Minutes: File:GOC2012MinutesSession2.pdf

Feb 27, 2012, Day 2 (full day)

Breakfast: 8:00AM
Meeting: 9:00AM

Case studies of state-of-the-art annotation approaches

Minutes: Jane, Paola

Each of these presentations will consider the experiences they have had and what the bottlenecks and issues that have been encountered. We will use these lessons to determine the best path forward to streamline and enrich the process.

  • Community approaches for recording annotations
    • Wiki-based annotation in CACAO: proposed improvements and potential generalizations (Jim Hu and Brenley McIntosh)
    • CANTO experiences (Val Wood). CANTO is a PomBase tool that is being developed by Kim Rutherford to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)
  • What additional gene function information is used to supplement GO annotation?
  • GOC Domain-specific curation and ontology development

Proposal for GOC use of common annotation framework to support literature annotation and subsequent phylogenetic annotation process

Minutes: Chris, Seth

  • Overview of the common annotation framework (Chris and Paul T.)
  • Details of the common annotation framework (Paul S, Paul T, Judy, Mike, Suzi)
  • Common annotation Software prototype demos (Kimberly et al.).
    • Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like. By the GO Consortium meeting, Kimberly will present the features that GOC curators feel are most important.
      • Any other aspects curators would require in an annotation tool.
      • What additional data should be supplied by annotation groups.
      • How best to use text-mining in the CAF for prioritizing curation work (e.g. Textpresso).
      • CAT Project development goals for next 3-6 months. (Kimberly & Chris Mungall)
      • PAINT as used for Quality Assurance: Dual perspectives (biological topic focus); cross-checking annotations; Phylogenetic inference: Synthesis, QA and inference across organisms using PAINT.
      • Integration and prioritization of phylogenetic annotations within the framework of experimental annotations.

Feb 28, 2012, Day 3 (full day)

Breakfast: 8:00AM
Meeting: 9:00AM

Concurrent sessions (4 hr)

How MODs can achieve full breadth of genome coverage: Focused annotation session for ~5-7 GO annotators per group: Led by Pascale, Paul, & Huaiyu handling groups individually: small groups to make each session manageable and productive. Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)


  • Mixed groups
    • those with previous training in PAINT annotation
    • no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)
    • Group 1: Pascale, Rama, Kimberly, - Prudence, Petra, Stacia, Dianna, Doug, Susan, Varsha, Aurore, Steven, Martha,
    • Group 2: Paul, Li, David, Tanya, - Julie, Karen, Cindy, Peter, Brenley, Paola, Ruth, Rex N, Lucas, Rajni
    • Group 3: Huaiyu, Donghui, Harold, Emily - Yasmin, Rob, Selina, Kalpana, Jim, Val, Jane, Carson, Diane

Decisions made, reorganization of manager groups and their coordinators.

Minutes: David, Kimberly, Harold

Minutes:File:Tuesday 02 28 Minutes.pdf

  • Defined tasks that will occur over the next six months.
  • The PIs will affirm that the Managers are responsible for communicating the tasks required to reach the projects agreed to goals.
    • The Directors empower the Managers to activity monitor the working groups progress and report any problems that inhibit reaching our goals.
    • There are commitments that need to be met by all members for the GOC to be a success.
  • To conclude the meeting we will finalize goals and what we’ve agreed to do going forward. (GO Directors)
    • Annotation and Ontology productivity working group
      • Pick a realistic goal for increasing the # of annotations/quarter over the next 5 years
      • Pick a realistic goal for increasing the information content (specificity) of the annotations over the next 5 years
      • Define and publish what is minimally required for community contributions and mechanisms to ensure quality standards are met. That is, minimal requirements for submitting GO annotations (for projects and MODs *not* funded via the GOC)
      • Efficient ways of leveraging the community
      • What tools and other infrastructure would assist.
      • Respond in a timely manner to community & sourceforge term requests
    • Annotation Integration & ontology review working group
      • Quality Control: Independent secondary checks of annotations
        • Automated checks
        • Semi-manual checks
        • Manual checking through random sampling
        • What criteria should be used: comprehensively annotated gene products via final paint approval, other QC checks
      • How to select targeted gene sets
      • Define what additional information we ideally would aim to collect to enrich the annotations coming from GO funded efforts. That is, What would an ideal annotation consist of and what are the responsibilities for GO curators (for projects and MODs *funded* via the GOC)
      • Ontology modeling, consistency and review
    • Framework working groups
      • Database infrastructure:
        • Define concrete steps for implementation of required infrastructure for meeting these annotation goals.
        • LEGO annotation framework
        • Sign off on the new annotation flow proposal.
        • Mechanism for sending GO annotations to the MODs. GO annotations will be generated by dedicated GO curators using the central CAT tool, and resultant
      • GO annotation tools working group
      • User and community interactions
    • Ad hoc groups
      • Fixed timeframe
      • Fixed deliverable(s)
    • GO Directors (Paul T, Paul S, Suzi, Mike, & Judy
      • Decision making
      • Set priorities for each working groups
      • Attend working group calls