Reference Genome ProgressReport April 2010 (Archived)

From GO Wiki
Jump to: navigation, search

Objectives and metrics

  • Sven: annotation status tracking tool (see below)

Curation Progress: Ref Genome Genes

The numbers of genes per species is based on the google spreadsheets, parsed by Mary Dolan. It excludes the Jan/Feb targets. See complete list on the ftp site here: [ftp://ftp.informatics.jax.org/pub/curatorwork/GODB/refg_id_list.txt ]

Group Number of curation targets Number of genes curated for ref genome
AgBase 570 250
BHF-UCL
dictyBase 463
Ecoli 158 142
FlyBase 422
GOA 1018 678
MGI 955 824
pombe 344
RGD 920 888
SGD 513 492
TAIR 705 619
WormBase 640 431
ZFIN 927

Curation Progress: PANTHER families via PAINT

About eleven protein families have been done with beta version of PAINT, primarily by Mike Livstone.

This has added annotations for 1,138 Ref Genome proteins. Over 280 inferences have been made, using 245 GO terms MikeLpresentation.

PAINT

PAINT is a software tool for functional annotation of gene families retrieved from the PANTHER database. Recent updates include the following:

  • Progress bar lets the user know the status when running during long processes
  • Added comment/evidence notes for annotations
  • Saving annotation session locally
  • Customizable colors for tree nodes
  • All viewers are now able to display different aspects, which are chosen through radio buttons
  • Annotation matrix gives the full picture of all annotations available and allow generating new annotations through a simple drag-n-drop mechanism
  • Implemented various rules for creating positive and NOT annotations
  • Searching is now allowed on both gene attributes and terms
  • Various bug fixes

Protein sets

Human, mouse, rat, chicken and zebrafish proteomes from UniProt are augmented with Ensembl proteins. 51 species are covered, including all reference Genomes species. Plasmodium falciparum and Ixodes scapularis will appear in the 3rd release.

Release 3 has a bug fix for duplicate entries between UniProt and Ensembl that link to the same UniProtKB/Swiss-Prot entry. It also sees a huge improvement for the chicken proteome. This is now the best representation of the proteome available by combining UniProt and Ensembl. A better understanding of the Ensembl pipeline identified a database input that was not included in the QfO proteome generation pipeline (IPI), with the addition of this extra database identifier the proteomes are more complete.

Full details of Release 2 are here: http://www.ebi.ac.uk/~dbarrell/qfo/

Release 3 will be ready for the end of March.

Concurrent annotation

We have switched from annotating unrelated genes to annotating genes involved in a narrower biological process. The advantages of this approach are

  1. facilitates coordination with ontology development
  2. makes it easier to do the annotations because we're addressing a single general area of biology
  3. makes it possible to solicit the help of experts to help review the annotations and ensure that nothing is missing.

Topics covered

  • Lung branching morphogenesis (November 2009- March 2010)
  • Heart development (April 2010 - )

Annotation status reporting tool

Sven Heinicke (in Kara Dolinski's group) have started to work on a tool that would display the annotation status of the Panther families. There will be a regularly updated web page that will show the following information:

  1. Panther family ID
  2. Date selected for concurrent annotation
  3. Date Panther family last annotated (tree annotations)
  4. Number of members
  5. Number of RefG members
  6. Number of members with EXP
  7. Date more recent member last annotated
  8. 'Date comprehensively annotated' for groups that can provide this information

Prototype

TreeView

Electronic annotation jamborees

To allow curators to discuss annotation consistency, we hold electronic annotation jamborees three times per year where we discuss annotation of two different genes. The latest was held in February 2010. We discussed SLIT2 (thought to act as molecular guidance cue in cellular migration, and function appears to be mediated by interaction with roundabout homolog receptors) and NIPBL (that probably plays a structural role in chromatin. Involved in sister chromatid cohesion, possibly by interacting with the cohesin complex).

The major action item from this meeting was the need to improve the representation of neuronal processes.

GO annotation camp

The SIB group in Geneva will host a meeting in June 2010 where annotators will meet to resolve specific annotation issues. The outcome of this meeting will be clear documentation about a number of focused topics. Unless we decide to change those at the GOC meeting, the plan is to discuss :

  • Binding and complexes (finalize protein binding documentation, see Binding_terms_working_group
  • Use of regulation
  • Response to terms, how will these relate to signaling terms and to final cellular effect
  • How is a downstream effect defined (i.e when not to capture phenotypes )

Advertising ref genomes at the MODs

New annotators

We are very pleased to announce that Swiss-Prot annotators are now doing GO annotation. Emily Dimmer and Rachael Huntley trained 34 annotators during the past year.