DictyBase December 2015

From GO Wiki
Jump to navigation Jump to search

dictyBaseDecember 2015

1. Staff:

PI: Rex Chisholm

Annotators: Petra Fey, Robert Dodson

Developers: Siddhartha Basu, David Jimenez-Morales (until Oct. 16, 2015)


All dictyBase staff contribute to GO activities. This is currently a total of 3 FTE positions. Of these we receive sufficient funding from the GO grant to support 0.5 FTEs.


Annotation:


Gene Ontology annotation is integral to the curation process at dictyBase. Annotation of gene products to GO terms is done concurrently with curation of literature, strains, phenotypes, and general nomenclature. Both curators work to annotate gene products of the Dictyostelium genome.

At dictyBase we have used the Protein2GO tool from the EBI to annotate GO for Dictyostelium proteins since June 2012. Since February 2013, we have re-imported annotations monthly from GOA back into dictyBase, including electronic annotations. We currently still append our GO annotations to RNAs and send the GAF file to the GO consortium.

The use of Protein2GO continues to be of great advantage and enables us to react to changing annotation practices avoiding a direct impact on our database. For example, dictyBase curators regularly add annotation extensions in Protein2GO although they cannot yet be stored or displayed at dictyBase. We expect to be able to provide annotation extensions to our users in early 2016.

Semi-automated annotation: we have done preliminary work with the intention of eventually using Textpresso to suggest GO terms for annotation to cellular component terms (Van Auken et al., BMC Bioinformatics 2009, 10:228). However, in an evaluation of the final test phase we determined that Textpresso doesn't fit with our current annotation flow, and have decided not to pursue use of Textpresso at dictyBase for the moment.

Petra is coordinating all dictyBase GO annotation issues with regard to Protein2GO with the EBI, attends the GO meetings, and contributes to the GO Help Desk. She also gave a presentation for graduate students in Berlin, Germany (Nov. 25, 2015), which contained an introduction to GO.

Petra and Bob update their annotations quickly when GO updates make annotations (e.g. annotation extensions) invalid. They also participate in the GO annotation consistency calls.

Siddhartha is working on implementing the GPAD and GPI files for the pipelines between the EBI, dictyBase and the GO consortium. He has written the GPAD loader software that completes the GO annotations pipeline between Protein2GO and dictyBase.


Other dictyBase contributions to GO:

Both dictyBase curators work to improve the GO with GO editors and other curators in the field, and contribute to discussions on the GO email list, in the bi-weekly annotation calls, and on GitHub. When necessary, they use TermGenie or GitHub to create new GO terms. In 2014, 8 new GO terms were added.

dictyBase Is currently working on a complete database and software overhaul. Once that is completed we will use GPAD/GPI files instead of GAF in our Protein2GO-dictyBase-GO consortium pipeline.

Annotation Progress

The total and non-IEA annotation numbers include some annotations that are not from dictyBase, with most of these from the PAINT project and a few from INTACT and SwissProt. At this point we cannot easily query them by source; however, the experimental annotations are nearly all from dictyBase and the indicated annotation extensions (column 16) have been exclusively annotated by dictyBase curators (Table 3).


Table 1: Number of Annotations


Dec2014
Dec2015
% Change
Total number of annotations 62961 69130 + 9.8%
Function 22569 22915 + 1.5%
Process 22748 24209 + 6.4%
Component 17644 22006 + 23%

Table 2: Number of non-IEA Annotations


Dec2014
Dec2015
% Change
Total number of annotations 24983 29547 + 18.3%
Function 6245 7572 + 21.2%
Process 11407 13267 + 16.3%
Component 7331 8708 + 18.8%

Table 3: Additional Numbers


Dec2014
Dec2015
% Change
Total EXP annotations 6366 7049 + 10.7%
Annotation extensions 235 367 + 56%

Methods and strategies for annotation

(please note % effort on literature curation vs. computational annotation methods)

Literature and other manual curation represent currently 100% of the curation activities at dictyBase.


Literature curation

GO annotations are added during Dictyostelium literature curation, along with strain, phenotype, nomenclature and gene product information. GO is annotated using the Protein2GO tool provided by the EBI. Annotations are currently imported into dictyBase and supplied to the GO consortium monthly.


Automated methods

IEAs are being imported from GOA and assigned to the respective gene products on a monthly schedule.


Quality control measures

dictyBase curators work closely to ensure that annotations are consistent between curators and conform to the guidelines set in the annotation documentation. We also have a set of internal guidelines recorded in the dictyBase Standard Operating Procedures (http://wiki.dictybase.org/dictywiki/index.php/Standard_Operating_Procedures) to which curators adhere. Curators discuss consistency issues as they arise and new decisions are recorded in the Standard Operating Procedures.