Difference between revisions of "GOlr"

From GO Wiki
Jump to: navigation, search
(Schema Progress)
(Schema Progress)
Line 13: Line 13:
 
We are currently working towards a flexible schema as defined in the [http://geneontology.svn.sourceforge.net/viewvc/geneontology/java/gold/solr/conf/schema.xml?revision=4429&content-type=text%2Fplain owltools] code. As we roll it out for the AmiGO backend and as a replacement for common GOOSE queries, we expect to find holes in the schema, documented below.
 
We are currently working towards a flexible schema as defined in the [http://geneontology.svn.sourceforge.net/viewvc/geneontology/java/gold/solr/conf/schema.xml?revision=4429&content-type=text%2Fplain owltools] code. As we roll it out for the AmiGO backend and as a replacement for common GOOSE queries, we expect to find holes in the schema, documented below.
  
Currently, the index is entirely populated by through an owltools command line program, but in the future additional document types may require other scripts for loading. It looks like:
+
Currently, the index is entirely populated by through an owltools command line program, using [https://code.google.com/p/owltools/source/browse/#svn%2Ftrunk%2FOWLTools-Solr%2Fsrc%2Fmain%2Fresources YAML] files for configuration. The Solr schema.xml generation looks like:
  
https://code.google.com/p/owltools/source/browse/#svn%2Ftrunk%2FOWLTools-Solr%2Fsrc%2Fmain%2Fresources
+
owltools --solr-config /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ont-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/bio-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ann-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ann_ev_agg-config.yaml --solr-schema-dump
  
owltools --solr-config /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ont-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/bio-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ann-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ann_ev_agg-config.yaml --solr-schema-dump
+
With the actual population along the lines of:
  
 
  owltools /srv/tmp/go.owl /srv/tmp/cl.owl /srv/tmp/taxslim.owl --solr-url http://localhost:8080/solr/ --solr-purge --solr-config /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ont-config.yaml --solr-load-ontology --solr-load-gafs /srv/tmp/gene_association.mgi
 
  owltools /srv/tmp/go.owl /srv/tmp/cl.owl /srv/tmp/taxslim.owl --solr-url http://localhost:8080/solr/ --solr-purge --solr-config /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ont-config.yaml --solr-load-ontology --solr-load-gafs /srv/tmp/gene_association.mgi

Revision as of 11:43, 11 April 2012

Overview

This page describes the status of the public Solr index for the GO. This index will replace some of the query functionality for GOOSE as well as become the new backend for AmiGO and other services.

Implementation Progress

We have looked at nginx as a reverse proxy for speed and to prevent unauthorized access to non-select URLs on the Solr server, and found it to be good.

We are now working on readying stove.lbl.gov to act as the public GO Solr server (GOlr) and AmiGO 2 client. The URLs will eventually be golr.berkeleybop.org and amigo2.berkeleybop.org.

Schema Progress

We are currently working towards a flexible schema as defined in the owltools code. As we roll it out for the AmiGO backend and as a replacement for common GOOSE queries, we expect to find holes in the schema, documented below.

Currently, the index is entirely populated by through an owltools command line program, using YAML files for configuration. The Solr schema.xml generation looks like:

owltools --solr-config /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ont-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/bio-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ann-config.yaml /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ann_ev_agg-config.yaml --solr-schema-dump

With the actual population along the lines of:

owltools /srv/tmp/go.owl /srv/tmp/cl.owl /srv/tmp/taxslim.owl --solr-url http://localhost:8080/solr/ --solr-purge --solr-config /home/sjcarbon/local/src/svn/owltools/OWLTools-Solr/src/main/resources/ont-config.yaml --solr-load-ontology --solr-load-gafs /srv/tmp/gene_association.mgi

Rewriting old GOOSE query examples for the new GOlr Schema

  • Example queries [1]
    • Some old queries require the use of Solr facets and some modification of code Seth has written to present simpler results queries.
    • Note: hierarchical queries that 'combine' facet queries can be handled by the facet.pivot functionality that is available in Solr 4. In the meantime, we'll see if we can devise a work-around.

Current Issues and Problems with the GOlr Schema

  • No PANTHER data
    • We'll look at creating a new document type from Suzi's PAINT code.
  • A few suggestions to make use of the document categories easier:
    • bioentity 'source' was not loaded correctly in the version I was testing
    • annotation should have 'with' field (I believe Seth said this is coming but thought I'd get it written down here)
    • ontology_class should have 'synonyms' and 'closure' (I believe Seth said this is coming)
    • annotation_aggregate should have 'go_id' as a separate field (I believe Seth said this is coming as 'alternate_id')