Full Text Indexing Progress: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 25: Line 25:
===Schema===
===Schema===


Production schema: [http://geneontology.svn.sourceforge.net/viewvc/geneontology/java/solr/solr/go-data-config.xml?revision=2881&view=markup]
The production schema
[http://geneontology.svn.sourceforge.net/viewvc/geneontology/java/solr/solr/go-data-config.xml?revision=2881&view=markup]
is essentially the SQL commands used to generate the data for Lucene,
in XML format.


Lucene schema: [http://geneontology.svn.sourceforge.net/viewvc/geneontology/java/solr/solr/schema.xml?revision=2934&view=markup]
The Lucene schema
[http://geneontology.svn.sourceforge.net/viewvc/geneontology/java/solr/solr/schema.xml?revision=2934&view=markup]
is how the GO data (taken by the production schema) is interpreted for
use in Lucene.
 
Right now, it is a very flat and basic schema. However,


===Hardware===
===Hardware===

Revision as of 17:47, 21 September 2010

Overview

There are two separate fronts of progress for FTI. The first is in the indexing system itself ("system"); this would include things like software used (Solr, Jetty, etc.), schema, deployment, hardware, and other low-level issues that are probably not going to be hugely important to end-of-the-line users and programmers. The second is the consumption and use of FTI ("software"). This would include the integration into various pieces of software, services built up around FTI, and (possibly) abstraction APIs.

While there are some blurry points in this distinction (e.g. what about a JSON service built directly into the engine), hopefully it will provide a logical way to divide most of the problems that will be faced.

System Progress

Goals

We would like to have a

Current

Schema

The production schema [1] is essentially the SQL commands used to generate the data for Lucene, in XML format.

The Lucene schema [2] is how the GO data (taken by the production schema) is interpreted for use in Lucene.

Right now, it is a very flat and basic schema. However,

Hardware

Solr on Jetty is currently installed on a BBOP development workstation. While not really available for public use, it is being used to test ways of integrating core software to use FTI (see below).

Past Experiments

Software Progress

Design Progress

One

Experimental


Two

???

Target

Software

Current

Past

  • ...
  • ...
  • ...