Difference between revisions of "2012 Annotation Meeting Stanford"

From GO Wiki
Jump to: navigation, search
m (Discussion: Top Priorities for Improving the GO annotation set.)
("How do we become an Efficient Annotation Factory?")
Line 40: Line 40:
 
=="How do we become an Efficient Annotation Factory?"==
 
=="How do we become an Efficient Annotation Factory?"==
  
* Goals and expectations for the next GO NIH grant [PI presentation]
+
===Discussion: Goals: the "gold standard" for GO annotations===
* Efficiency should address volume, depth, quality and breadth of annotations.
 
  
===Discussion: Top Priorities for Improving the GO annotation set.===
+
* Depth of annotation: What are the components of a "gold standard" GO annotation?
 +
* Breadth of annotation: How can MODs achieve full genome coverage?
 +
* Where are we now?
 +
* Post-meeting action item: practical plan for getting from the current state to the "gold standard" (or as close to it as possible)
  
** What is a core GO annotation (depth of an annotation)? What are the essential data needed from annotation groups and minimal frequency of data provision? What does the PIs consider needed to create a Gold Annotation set
 
  
** Breadth vs Depth? How can MODs achieve full genome coverage? Should there be a requirement to increase coverage (this is at the core of the GOC's mandate because without broad coverage, propagation to other genomes will not be possible)
+
=== Discussion: what should the annotation process be?===
  
** Where are the inefficiencies currently? What is missing or what is needed to make the core annotation concept work for all contributing groups.
+
* Process and Lessons learned from previous efforts to coordinate ontology and annotation efforts.  Focus is on how the processes might be generalized, with specific details only as supporting examples.
 
+
** domain-specific curation and ontology development
** How the GOC can better support curators with under-powered annotation tools before CAF becomes available?
+
*** transcription overhaul
 
+
*** apoptosis (domain-specific curation)
** Change in paradigm - rather than groups contributing to GOC, groups will take annotations from GOC (all annotations go into a central GO database and then groups take annotations for their taxonIDS).
+
** Swiss-Prot GO term curation process
 
 
=== Discussion: Improving the Annotation Process===
 
 
 
** Goals and Lessons learnt from previous efforts to coordinate ontology and annotation efforts
 
  
 
** what sub-components are needed to make the process efficient? For example-how do we facilitate identifying literature? are there tools (texptpresso)?
 
** what sub-components are needed to make the process efficient? For example-how do we facilitate identifying literature? are there tools (texptpresso)?

Revision as of 10:26, 14 December 2011

Agenda

Goals

  • Intent is not to go over annotation specifics (how to annotate, what evidence etc, all of which can be done well over annotation conf. calls). Should focus on Annotation as a process and not the thing. How to facilitate the process of annotation?
  • Should enable developers to gather ideas for the CAF
  • Possibly open protein2GO or the Pombase community annotation tool to curators so they can give feedback on what works and what doesn't
  • Pick a subprocess as the theme and see the evolution of the ontology and annotations related to the subprocess
  • dedicated session on PAINT training (hands on training) to propagate annotations made on the subprocess
  • Possibly develop a model for annotating complexes and build on col-16 curation


Ideas from PIs

TITLE: Supporting Curation of Annotations in the GOC

1. Common Annotation Tool; Streaming annotations into AmiGO

  • Establishing requirements and their priorities (Kimberly)
  • software implementation strategy (Chris)

2. Process-focused Annotation Approach

  • incorporating experiment, ontology deve, domain experts, phylogenetic inferencing
  • using transcription and sub-process of apoptosis

3. Resolving Perennial Annotation Issues ( one or more) (what are they? list !!)

  • pre-developed proposal essential
  • decision making process has to be clarified
  • need to have input from parties

4. Metrics: Quality and Completeness of Annotations

  • quality control of annotation streams
  • capturing evidence and source
  • help for users in using annotations correctly
  • ranking literature for curation
  • evaluating annotation 'currency'



Proposed Agenda

Sunday afternoon: 1pm until 4:30pm

"How do we become an Efficient Annotation Factory?"

Discussion: Goals: the "gold standard" for GO annotations

  • Depth of annotation: What are the components of a "gold standard" GO annotation?
  • Breadth of annotation: How can MODs achieve full genome coverage?
  • Where are we now?
  • Post-meeting action item: practical plan for getting from the current state to the "gold standard" (or as close to it as possible)


Discussion: what should the annotation process be?

  • Process and Lessons learned from previous efforts to coordinate ontology and annotation efforts. Focus is on how the processes might be generalized, with specific details only as supporting examples.
    • domain-specific curation and ontology development
      • transcription overhaul
      • apoptosis (domain-specific curation)
    • Swiss-Prot GO term curation process
    • what sub-components are needed to make the process efficient? For example-how do we facilitate identifying literature? are there tools (texptpresso)?
    • efficiency in terms of ontology development, PAINT inferencing etc.
    • Coordinated Annotation, Ontology and Software Development in the GOC
    • Specific Example: interfacing the with ontology development, topics arising from the current Apoptosis annotation effort. (Paola and Emily?)


Monday

Specific developments to improve the current GO annotation format

Improving the information content

1. Proposal for the definition/documentation of the default gene product to GO term relationships (Chris?)(note: we will discuss this item only if it is deemed as needed to make a Core Annotation)

2. GO Annotation above and below the gene product: Developing the annotation format for Protein Complexes

    • Rama, Harold and Emily to present guidance on how protein complex identifiers could be annotated with GO terms
    • Proposal for redefinition of the ‘contributes_to’ qualifier so that it can be used consistently by all groups
    • Outcome of pilot project for annotating to GO protein complex ids using the ‘integral_to’ qualifier. (UniProtKB, SGD, MGI?)

3. Data represented in the Annotation Extension Field (column 16). An increasingly important part of the annotation format. (note: we will discuss this item only if it is deemed as needed to make a Core Annotation)

    • work to develop guidelines, QC checks, Relationship ontology developments.
    • Guideline proposal from GO Ontology group for when curators should use column 16 instead of making a GO Term request
    • Discussion of appropriate display of column 16 data in GO browsers, e.g. display as column 16 – or interpret as an extension of the GO term?


Improving GO annotation consistency through ontology development

* protein binding (Jane Lomax)

    • Proposal to be presented on new guidance for terms describing binding.
    • Focusing on the importance of keeping functional information in this node that is seen as important by users/curators but also with the aim of improving annotation consistency.


Making the GO annotation tool of the future

* Kimberly to report on spec for CAF.

  • Kimberly is currently talking to all curation groups about individual GO annotation tools, what features they have and what features curators would like.
  • Therefore, by the GO Consortium meeting, Kimberly will be able to present the features that GOC curators feel are most important.
  • Subsequent discussion on:
    • any other aspects curators would require in an annotation tool.
    • What additional data should be supplied by annotation groups
    • How best to use textmining in the CAF for prioritizing curation work (e.g. Textpresso)

* Val to report on the Community Annotation Tool (PomBase)

    • This is a PomBase tool that is being developed by Kim to include GOC requirements to make it become available to community experts, who would like to submit small sets of GO annotations to the GO Consortium, which would then need to be reviewed by GOC groups. (Kim and PomBase will be keeping Kimberly and the CAF working group in the loop as to developments)
    • Discussion on how best to advertise tool to community and how to manage annotation submissions within the Consortium.

Tuesday

Transferring annotations to non-Model Organisms

* PAINT

A focused annotation session for ~10 GO annotators (limit decided due to need for the session to be manageable and productive). Led by Pascale.

    • Annotators would be selected on the basis:

- as well as those with previous training in PAINT annotation (e.g. Mike L., Rama, Li Ni, Donghui) - no training, however strong possibility in using PAINT later on to create GO annotations (e.g. GO NIH funded curators)

    • Annotations to transfer would be selected on the basis of recent annotation work by GO Consortium groups that are now in the GO database, to terms from the ontology which have been reviewed and likely to remain stable (e.g. from the recent transcription annotation effort)
    • Time required: minimum: 5 hours.

Making annotation public

* Summary of new annotation QCs agreed.

    • Discussion: Any contentious annotation QCs that need to be discussed further
    • Resolution of GO annotation filtering by species on the GOC site, progress since last GOC meeting
    • Development of a GPAD annotation file directory and the ECO resource (action items from last GOC meeting)

Evaluating efficiency

Process Discussions:

    • can we use Jim Hu's students for some peripheral curation
    • How to respond best to community annotation requests

metrics discussion:

    • how best to measure annotation progress?
    • Possible stats: Count of new terms used in annotation? Count of comprehensively annotated gene products? Count of EXP-evidenced annotations, Count of species with new annotation sets? Count of new checks implemented?
    • what combination of stats would best reflect our curation efforts?
    • How can the selected set of metrics be most effectively created, what information do groups need to be ready to supply the GOC with?



Preparation needed in advance of GO Consortium meeting

GO annotation calls.

There are only 5 annotation calls scheduled before the GOC meeting. Therefore we need to use this time wisely. If we have one major topic per call, perhaps it could be:

1. Annotation to ‘contributes_to’

2. Default gp-GO term Relationship definitions

3. Protein Binding

4. Column 16

5. Focused apoptosis annotation discussion.

...with a regular slot for a couple of QC checks, to get the uncontentious ones agreed upon and if possible, implemented, before GOC meeting,

Work outside of the GO annotation calls to be discussed on GO list?

  • ISS/IC issue brought up by Ruth at the GOC meeting. A proposal almost ready to be emailed to GO list (Emily, Ruth)
  • Column 17 concerns; developing the GAF spec; from recent emails by Amelia (Amelia, Mike, Chris)
  • Documentation for GPAD format, creation of regularly updated directory on GOC site using (Tony, Chris, Amelia)
  • Resolving annotation filtering on GOC site where groups responsible for a species are not (Mike, UniProt-GOA)
  • IGI and ‘with’ field? (new item raised by SGD)
  • Documentation that needs to be created to support wider use of IKR (Emily, UniProt-GOA)