GOlr

From GO Wiki
Jump to navigation Jump to search

Overview

This page describes the status of the public Solr index for the GO. This index will replace some of the query functionality for GOOSE as well as become the new backend for AmiGO and other services. A preview server is now active at http://golr.berkeleybop.org.

Use

Feel free to use it as you will and start to write for it. As this is still a little early, we are only loading the GO, CL, and taxslim, as well as the MGI and dictyBase GAFs. Please see the AmiGO_Labs caveats, as well as the downtime information on this page.

Implementation Progress

We are now loading properly from owltools.

We have looked at nginx as a reverse proxy for speed and to prevent unauthorized access to non-select URLs on the Solr server, and found it to be good.

The GOlr server is now active at http://golr.berkeleybop.org (nginx front on stove).

We now also have http://amigo2.berkeleybop.org working on stove and backed by golr.berkeleybop.org.

We are now on Solr 3.6.

Schema Progress

We are currently working towards a flexible schema as defined in the owltools code. As we roll it out for the AmiGO backend and as a replacement for common GOOSE queries, we expect to find holes in the schema, documented below.

Currently, the index is entirely populated by through an owltools command line program, using YAML files for configuration. The Solr schema.xml generation looks like:

owltools --solr-config /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ont-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/bio-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ann-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ann_ev_agg-config.yaml --solr-schema-dump

With the actual population along the lines of:

owltools http://purl.obolibrary.org/obo/ncbitaxon/subsets/taxslim.owl http://purl.obolibrary.org/obo/cl.owl http://purl.obolibrary.org/obo/go.owl http://purl.obolibrary.org/obo/eco.owl --solr-url http://localhost:8080/solr/ --solr-purge --solr-config /home/bbop/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ont-config.yaml --solr-load-ontology --solr-load-gafs /srv/tmp/gene_association.mgi

Rewriting old GOOSE query examples for the new GOlr Schema

Example queries on the wiki.

  • Some old queries require the use of Solr facets and some modification of code Seth has written to present simpler results queries (i.e. the facet results are not displayed).
  • Note: hierarchical queries that 'combine' facet queries can be handled by the facet.pivot functionality that is available in Solr 4. In the meantime, we'll see if we can devise a work-around.

Current Issues and Problems with the GOlr Schema

  • No PANTHER data
    • We'll look at creating a new document type from Suzi's PAINT code.
    • Looking to add PANTHER data as extra fields.
  • A few suggestions to make use of the document categories easier:
    • bioentity 'source' was not loaded correctly in the version I was testing FIXED
    • annotation should have 'with' field (I believe Seth said this is coming but thought I'd get it written down here) FIXED
    • ontology_class should have 'synonyms' and 'closure' (I believe Seth said this is coming) should be in now
    • annotation_aggregate should have 'go_id' as a separate field (I believe Seth said this is coming as 'alternate_id') should be annotation_class* (fixed)
    • all id/label closure pairs need a corresponding JSON structure to map the two to eachother
    • JSON blob for graph structure

Downtime

Rebuilds start at 10pm PDT (6am BST; 5pm NZST), so there may be some data gaps in there, but that is most significant for document types other than ontology_class (which get rebuilt rather quickly).

At the time of this writing, this process is taking under an hour.