Changes between Initial Version and Version 1 of SearchEngine


Ignore:
Timestamp:
02/24/10 11:50:12 (4 years ago)
Author:
dread
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SearchEngine

    v1 v1  
     1= CKAN Search Engine = 
     2 
     3== Use Cases == 
     4As a user of a CKAN instance I want to be able to make complicated searches, referencing the data fields. 
     5 
     6== Design ==  
     7 
     8Search technology - Apache SOLR is selected 
     9 
     10Architecture: SOLR to work both alongside and as a replacement for the existing full text search in CKAN. 
     11 
     12There are two main options for getting data into SOLR: 
     13 
     14 * POST the records to SOLR in XML format ([http://wiki.apache.org/solr/UpdateXmlMessages docs]) 
     15 * Direct connection Setup SOLR ([http://wiki.apache.org/solr/DataImportHandler docs]) 
     16  * Provide SELECT statements to do queries 
     17  * Process is initiated by doing a GET to a particular SOLR URL 
     18 
     19The preference is for the first option as the abstraction provides more flexibility in the db and more control about what gets indexed. 
     20 
     21When to index a package? Currently we index it on database after_insert and after_update triggers. But this might seriously slow down a large data import since the indexing requires a POST over the internet. Maybe keep the triggers, but for a batch import we can turn them off and then manually run the indexing. Alternatively store up changes and do an hourly cron. 
     22 
     23== Tickets == 
     24 
     25 1 Get a SOLR instance running, using basic config. 
     26 2 Get indexing and searching working with name and title fields only: 
     27   * Harness one of the three python SOLR libraries to send SOLR Update XML of CKAN Packages (triggered on the command-line).  
     28   * Write tests for SOLR by sending data with SOLR library and using JSON interface for queries. 
     29 3 Get it working with all package fields, optimising the field descriptions in schema.xml.  
     30 4 Trigger the indexing sensibly (as decided above). 
     31 5 Provide option to connect CKAN's search WUI to SOLR back-end. 
     32 6 CKAN Developer docs - description of how to setup SOLR link and schema.xml.