DictyBase December 2014

From GO Wiki
Revision as of 16:11, 11 December 2014 by Pfey03 (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

dictyBaseDecember , 2014

Staff:

PI: Rex Chisholm

Annotators: Petra Fey, Robert Dodson

Developers: Siddhartha Basu, David Jimenez-Morales

All dictyBase staff contributes to GO activities. This is currently a total of 3 FTE positions. Of these we receive sufficient funding from the GO grant to support 0.6 FTEs.

Annotation:

Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, strains, phenotypes, and general nomenclature. Both curators work to annotate gene products of the Dictyostelium genome.

At dictyBase we have used the Protein2GO tool from the EBI to annotate GO for Dictyostelium proteins since June 2012. Since February 2013, we have re-imported annotations monthly from GOA back into dictyBase, including electronic annotations. We then append our GO annotations to RNAs and send the GAF file to the GO consortium.

The use of Protein2GO continues to be of great advantage and enables us to react to changing annotation practices avoiding a direct impact on our database. For example, dictyBase curators regularly add annotation extensions in Protein2GO although they cannot yet be stored or displayed at dictyBase. We expect to be able to provide annotation extensions to our users in 2015.

Semi-automated annotation: We have worked for a while towards using Textpresso to suggest GO terms for annotation to cellular component terms (Van Auken et al., BMC Bioinformatics 2009, 10:228). We participated in the BioCreative Workshop Track III in 2012, for which we compared purely manually curated cell component annotations with Textpresso-assisted annotations. However, implementation at dictyBase was delayed because of the GO consortium's wise decision to connect Textpresso annotations directly to Protein2GO. We tested the updated Textpresso annotation tool and provided valuable feedback to Kimberly van Auken. We also provide a monthly updated GPI file containing all dictyBase protein names and identifiers to make the text mining a success. This December, we have started incorporating Textpresso into our curation workflow.

  • Petra is coordinating all dictyBase GO annotation issues with regard to Protein2GO with the EBI and attends the GO meetings. She also provided detailed information on experience with Protein2GO for the EBI case study, conducted by Neal Beagrie.
  • Robert and Petra are working with Kimberly Van Auken on the semi-automatic CC annotations using Textpresso.
  • Siddhartha is working on implementing the GPAD and GPI files for the pipelines between the EBI, dictyBase and the GO consortium.
  • David provides a GPI file to Textpresso, updated monthly.

Other dictyBase contributions to GO

Both dictyBase curators work to improve the GO with GO editors and other curators in the field, and contribute to discussions on the GO email list, in the bi-weekly annotation calls, and on Sourceforge. When necessary, they use TermGenie or Sourceforge to create new GO terms. In 2014, 14 new GO terms were added.

dictyBase Is currently working on a complete database and software overhaul. Once that is completed we will use GPAD/GPI files instead of GAF in our Protein2GO-dictyBase-GO consortium pipeline.

Annotation Progress

The total and non-IEA annotation numbers include annotations that are not from dictyBase, most from the PAINT project with a few from INTACT and SwissProt. At this point we cannot easily query them by source. However, the experimental annotations are nearly all from dictyBase and the indicated annotation extensions (column 16) have been exclusively annotated by dictyBase curators (Table 3).

Table 1: Number of Annotations

2013
2014
% Change
Total number of annotations 57925 62961 + 8.7%
Function 21968 22569 + 2.7%
Process 19581 22748 + 16.1%
Component 16376 17644 + 7.7%


Table 2: Number of non-IEA Annotations

2013
2014
% Change
Total number of annotations 20367 24983 + 22.6%
Function 4961 6245 + 25.9%
Process 9248 11407 + 23.3%
Component 6158 7331 + 19%


Table 3: Additional Numbers

2013
2014
% Change
Total EXP annotations
5598
6366
+ 13.7%
Annotation extensions
187
235
+ 25.6%

Methods and strategies for annotation

(please note % effort on literature curation vs. computational annotation methods)

Literature and other manual curation represent nearly 100% of the curation activities at dictyBase.


Literature curation

GO annotations are added during Dictyostelium literature curation, along with strain, phenotype, nomenclature and gene product information. GO is annotated using the Protein2GO tool provided by the EBI. Annotations are currently imported into dictyBase and supplied to the GO consortium monthly.

Use of Textpresso to annotate cellular components has just started. We have no data about changes in efficiencies or to curation procedures as of yet.


Automated methods

IEAs are being imported from GOA and assigned to the respective gene products on a monthly schedule.


Quality control measures

dictyBase curators work closely to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://wiki.dictybase.org/dictywiki/index.php/Standard_Operating_Procedures) to which curators adhere. Curators discuss consistency issues as they arise and decisions are recorded in the Standard Operating Procedures.