DictyBase Nov2011: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
Line 1: Line 1:
==dictyBase Statement of Year 2011 objectives (March 1, 2011 – February 29, 2012)==
==dictyBase Statement of Year 2011 objectives (March 1, 2011 – February 29, 2012)==

Latest revision as of 17:27, 31 January 2012

dictyBase Statement of Year 2011 objectives (March 1, 2011 – February 29, 2012)


PI: Rex Chisholm
Annotators: Petra Fey, Robert Dodson, Pascale Gaudet (part time, ref genome)
Developers: Siddhartha Basu, Yulia Bushmanova (until July 2011)

All dictyBase staff contributes to GO activities. This is a total of 4.1 FTE positions. Of these we receive sufficient funding from the GO grant to support about 1.05 FTEs.

Annotation. Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, strains, phenotypes, and until June 2011, sequences. All curators work to annotate gene products of the Dictyostelium genome.

We have recently moved to store GO annotations in Chado and implemented the ability to import obo and GAF2.0 format files. This move made our own GO annotation tool obsolete, which coincided with a curator focus on gene model curation until June 2011. Because of our small staff, we decided to make use of the Protein2GO tool at EBI, and GO annotations with that tool have started in August 2011.

The process of moving to use the Protein2GO tool required changes and 'cleanup' on many levels. For example, we deleted all our ISS annotations to InterPro because most were outdated and invalid; we fixed many 'with' IDs that were incorrect because free text introduced many mistakes; we mapped our internal references to GO references, for which one had to be added and others needed to be widened/changed. Countless smaller issues arose, and Emily Dimmer and Tony Sawford from the EBI have been and are always tremendously helpful!

Semi-automated annotation: We are starting to use Textpresso to suggest GO terms for annotation to cellular component terms (Van Auken et al., 2009, BMC Bioinformatics 2009, 10:228). This tool has been trained to evaluate the semantic context of GO terms to enrich for terms in sections of the papers describing methods or results, providing a high recall rate for experimentally supported GO annotations. We are currently establishing a pipeline that will run full text search on GO cellular component terms and provide curators suggestions for annotation. A set of uncurated 2010 Dicty papers has been selected for a pilot run of the CC pipeline. Wormbase is currently working on 'cloning' the curation interface for dictyBase curators, so annotations can be approved, rejected or modified once the pipeline has been run. This will increase efficiency of GO curation by reducing time curators spend on literature mining.

Reference Genome Project: Pascale is manager for the Reference Genome project. All dictyBase curators annotate reference genome genes and are up-to-date with the selected orthologs. A paper describing the project has been published earlier this year: Mi et al 2010. PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. NAR Database issue [1]

Petra is a member of the Newsletter group and the Reference Genome annotation group, and has prepared dictyBase annotations to be annotated with Protein2GO with the help of Emily Dimmer.

Petra, Robert and Siddhartha are working with Arun Rangarajan and Kimberly Van Auken towards semi-automatic CC annotations using textpresso.

Siddhartha is part of the Software group and is working with Tony Sawford on the pipeline to bi-weekly import our annotations from the EBI.

Pascale is manager of the Reference Genome group.

Other dictyBase contributions to GO:

All dictyBase curators work to improve the GO with GO editors and other curators in the field, and contribute to discussion on the GO email list and Source Forge. In the course of the Reference Genome Project, Pascale has worked with Suzi Lewis, Kara Dolinski and Paul Thomas (Panther) to develop a tree-based orthology inference system for the Reference Genome project resulting in the development of the software tool, PAINT. This tool is being used to make the tree-based annotations, Pascale is a PAINT annotator. A second paper on the project has been published in 2011: Gaudet P, Livstone MS, Lewis SE, Thomas PD (2011) Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Briefings in Bioinformatics. 12(5):449-62. PMID: 21873635; PMCID: PMC3178059.

Other species related to Dictyostelium discoideum, including D. purpureum and D. fasciculatum have been completed. dictyBase created the multi genome environment which now contains a D. purpureum website and the release of the D. fasciculatum and P. pallidum websites is imminent. We are planning to semi-automatically tranfer experimental GO annotations by ISS to 1:1 orthologs in these species in the future.

Annotation Progress

Table 1: Number of Annotations

09/2010 03/2011 % Change
Total number of annotations 31109 31604 +2 %
Function 13240 13278 + 0.3%
Process 9980 10001 + 0.2%
Component 7889 8325 +5%

Table 2: Number of non-IEA Annotations

09/2010 03/2011 % Change
Total number of annotations 20410 20985 +2%
Function 6878 6959 + 1%
Process 7304 7358 + 0.7%
Component 6228 6668 + 7%

Methods and strategies for annotation

(please note % effort on literature curation vs. computational annotation methods)

Literature and other manual curation represent nearly 100% of the curation activities at dictyBase.

Literature curation

In addition to gene product, strain and phenotype annotation, dictyBase curators extract GO annotations from Dictyostelium publications. GO annotations are added using the Protein2GO tool provided by the EBI.

Using of textpresso to annotate cellular components is imminent. Extension of the textpresso pipeline to capture GO molecular functions is currently under development by WormBase.

Automated methods

IEAs will be imported from GOA and assigned to the respective gene products on a biweekly schedule.

Quality control measures

dictyBase curators work closely to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://wiki.dictybase.org/dictywiki/index.php/Standard_Operating_Procedures) to which curators adhere. Curators discuss consistency issues as they arise and decisions are recorded in the Standard Operating Procedures.