Common Annotation Framework report for 2014

From GO Wiki
Revision as of 16:56, 19 December 2014 by Suzi (talk | contribs) (Created page with "We will develop a Common Annotation Framework. =Accomplishments= == Noctua == Noctua is a new project that allows users to simultaneously collaborate in the creation and bui...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

We will develop a Common Annotation Framework.

Accomplishments

Noctua

Noctua is a new project that allows users to simultaneously collaborate in the creation and building of complex annotations using a graph-based interface. It runs entirely in the user’s web browser (JavaScript), making use of jsPlumb and Socket.io. We have made substantial progress in the past year taking Noctua from design concept, to prototypes, to usable alpha which is currently being used for usability evaluation. We are working against a development roadmap that seeks to meet future community needs based on the feedback we are receiving from our alpha testers. Our progress to date includes:

  • Evaluation of a range of possible technologies for Noctua development, including creation of early demonstrations prototypes to evaluate technology from both user and developer perspectives
  • Design and implementation of a three part architecture to maximize flexibility and minimize component complexity: a web client, a coordination layer (Minerva), and a data engine (Barista)
  • To the Noctua web client interface we have added new features, enhancements, and issued fixes generated from user feedback. Noctua reuses AmiGO-based widgets for term and gene product searches ensuring easier maintenance and lowered development costs.
  • The ‘Minerva’ coordinator implementation includes:
  • a messaging and authorization server to coordinate communication across multiple clients (using Socket.io) and mediate communications with the BBOP JS data engine
  • a login and authorization service (via Mozilla Foundation’s Persona identity tokens)
  • The ‘Barista’ data engine component provides the data store, data model management, and all logical operations on the models, as well as attempted operation and model status. While our future plans call for use of a triplestore, the data engine currently uses an in-memory model, the filesystem, and Amazon S3 as its data store. The data engine is currently seeded in multiple ways to support legacy annotations and migrate existing GO annotations:
  • Seeding of LEGO models from existing Annotations GAF files and ontology
  • Conversely we extended our OWLTools library to convert LEGO models to GPAD/GPI and GAF files. This backwards compatibility is automatically maintained by a Jenkins job as a standard part of the GO data pipeline
  • Additional extensions to the BBOP JS library (supporting both AmiGO & Noctua) to make it even more generic and add new functionality. The enhanced implementation of service agents supports fine-grained client/server interactions in Noctua
  • Met with Dexter Pratt (Ideker group) to initiate discussions on integration with NDEx, OpenBEL format

Protein2GO

  • Protein2GO now allows for the creation of annotations to protein complexes. Protein complex identifiers obtained from the IntAct protein complex portal at the EBI can now be used to make GO annotations to complexes. Annotation guidance for complex annotation will evolve as more annotations are created.
  • Protein2GO now has an ‘Author Contact’ feature. This allows curators to email corresponding authors after they have curated their paper. The emails are sent out at release time and invite authors to view annotations created from their publication; inviting their feedback should they have any. Since the introduction of this feature, we have received positive emails from authors regarding the annotations created and the useful of GO in capturing information from their publications.

Text Mining

  • WormBase and Textpresso developed a new support vector machine (SVM) document classifier for a subclass of the Molecular Function ontology: catalytic activity. This SVM is included in the WormBase data flagging pipeline and will be incorporated into the Textpresso Central suite of curation tools (see below).
  • MGI, WormBase, and Textpresso are collaborating on a document classification pipeline to help MGI identify papers suitable for curation using an SVM classifier to distinguish mouse from non-mouse papers. The initial SVM has been developed and further work will be aimed at identifying mouse markers (genes) associated with experimental data in these papers.
  • Textpresso started developing a literature curation platform, Textpresso Central, that enables curators to perform full text literature searches, view and curate research papers, and train and apply machine learning and text mining algorithms for semantic analysis and curation purposes. Textpresso Central provides capabilities to select, edit and store lists of papers, sentences, terms and categories in order to perform training and mining. Textpresso uses state-of-the-art software packages and frameworks such as the Unstructured Information Management Architecture (http://uima.apache.org), Lucene (http://lucene.apache.org), and Wt (http://www.webtoolkit.eu/wt). The corpus of papers can be built from full text articles available in PDF format (http://en.wikipedia.org/wiki/Portable\_Document\_Format) or NXML (http://dtd.nlm.nih.gov/).