Full Text Indexing Progress: Difference between revisions
No edit summary |
No edit summary |
||
Line 35: | Line 35: | ||
=System Progress= | =System Progress= | ||
==Installation== | |||
Solr on Jetty is currently installed on a BBOP development | |||
workstation: | |||
* http://accordion.lbl.gov:8080/solr | * http://accordion.lbl.gov:8080/solr | ||
Line 41: | Line 44: | ||
Currently, it is not terribly useful unless you're sending it the | Currently, it is not terribly useful unless you're sending it the | ||
right commands. It probably won't be played with again at least until | right commands. It probably won't be played with again at least until | ||
AmiGO 1.8 is out and we try and switch the search backend over for | AmiGO 1.8 is out and we try and switch the search backend and | ||
1.9. | autocomplete over for 1.9. | ||
==Schema== | |||
The production schema | The production schema | ||
Line 60: | Line 61: | ||
Right now, it is a very flat and basic schema. However, | Right now, it is a very flat and basic schema. However, | ||
=== | ==Hardware== | ||
=Software Progress= | |||
=Past Experiments= | =Past Experiments= |
Revision as of 19:57, 21 September 2010
Overview
There are two separate fronts of progress for FTI. The first is in the indexing system itself ("system"); this would include things like software used (Solr, Jetty, etc.), schema, deployment, hardware, and other low-level issues that are probably not going to be hugely important to end-of-the-line users and programmers. The second is the consumption and use of FTI ("software"). This would include the integration into various pieces of software, services built up around FTI, and (possibly) abstraction APIs.
While there are some blurry points in this distinction (e.g. what about a JSON service built directly into the engine), hopefully it will provide a logical way to divide most of the problems that will be faced.
Goals
A changeable list of goals as we progress:
Produce a basic stand alone FTI based on Solr.Make sure it's better than the previous attempts (benchmark).- Convert services currently consuming old FTI to Solr.
- Likely replace current autocomplete with Solr proxy calls.
- Move to new/public hardware.
- Create public interface.
- Produce version with "complicated" schema and test for practical speed.
- "Big join" test.
- See if scaling is practical.
- Try other proxies/balancers (Nginx, Cherokee, etc.).
- Functional as virtualized service (see Virtualization).
- Create rich searching interfaces using new engine.
- Final will need to be combined with "ontology engine".
System Progress
Installation
Solr on Jetty is currently installed on a BBOP development workstation:
Currently, it is not terribly useful unless you're sending it the right commands. It probably won't be played with again at least until AmiGO 1.8 is out and we try and switch the search backend and autocomplete over for 1.9.
Schema
The production schema [1] is essentially the SQL commands used to generate the data for Lucene, in XML format.
The Lucene schema [2] is how the GO data (taken by the production schema) is interpreted for use in Lucene.
Right now, it is a very flat and basic schema. However,
Hardware
Software Progress
Past Experiments
Past experiments for FTI have included various combinations of:
- Perl/CLucene
- Xapian
- Apache mod_perl
- FCGI
- Ruby/Ferret