Software Group progress report for 2013

From GO Wiki
Jump to navigation Jump to search

Specific Aim 4: We will to create and implement a common annotation framework (CAF) for the GO curation community. We are pursuing a multi-pronged approach to this development. First, we are getting participating groups to migrate to UniProt's Protein2GO annotation tool to have a centralized curation form and storage with extensive integrity checks. We have incorporated the use of an existing text mining tool into Protein2GO via webservices. Meanwhile, we are developing the pieces for the next generation of curation tool, in particular, we have developed a prototype Paper Viewer.

Progress from the Caltech Team (GOC, WormBase, Textpresso):

1- As part of the migration to a common annotation framework, WormBase-Caltech completed its round-trip data migration from UniProt's Protein2GO annotation tool to the WormBase database.

2- WormBase and Textpresso completed development of a new Textpresso for Cellular Component Curation (CCC) tool that includes additional features such as auto-completion of GO terms, mapping of gene names and synonyms in text to MOD and UniProtKB IDs, and enhanced search capabilities of sentence source files and annotations. Most importantly, the new CCC tool and Protein2GO are now fully integrated: annotations made with the CCC tool are automatically sent to Protein2GO via web services.

3- In collaboration with Textpresso, Tony Sawford added a Literature Search link to the Protein2GO tool that allows GO curators to perform keyword searches on nine different Textpresso corpora from within Protein2GO.

4- In anticipation of expanding the Protein2GO tool to allow for annotation of non-coding RNAs, protein complexes, and orphan, genetically defined loci, we have solicited curation groups for a list of relevant entity identifiers. We have had one conference call to begin to discuss the details of implementing additional non-UniProtKB entity identifiers into Protein2GO.

5- Progress on molecular function curation. We have begun gathering specific requirements for a Textpresso for Molecular Function curation tool that will be adapted from the CCC tool described in 2.

6- Progress on paper viewer. One key component of the CAF will be a Paper Viewer that allows curators to see the results of text-mining and to highlight parts of papers to support annotations and to provide feedback on the text-mining output. In collaboration with the Textpresso project, we have rebuilt the way in which Textpresso marks up text to support a paper viewer and multiple overlapping highlights to text.

7- We completed development of a literature annotation tool, GOAT (GO Annotation Tool) that allows curators to highlight sentences in the full text of HTML documents and associate that text with GO annotations. Annotations are saved as hyperlinks that when clicked, show the annotation details. GOAT was used by nine GO curators to annotate 200 full text articles to create a gold standard corpus for text mining as part of the BioCreative IV Task 4 (http://www.biocreative.org/tasks/biocreative-iv/track-4-GO/2013).