Solr Configuration - Schema, Data Importer, and Nutch Crawler
Solr Configuration
Schema.xml
Data Importer
and how to map the data importer to Schema?
Understanding Solr Schema
Solr indexing data will filled with docs. Each doc has all kind of fields. The file schema.xml defines the field types and fields. Solr client will create solr doc and fill in each field based on requirement. Solr data import handler has a configuration file to map database column to solr doc field.
A quick tutorial for Schema xml
SchemaXML Wiki: http://wiki.apache.org/solr/SchemaXml.
SchemaDesign Wiki offers some general tips for designing Schema.
Using solr in online travel to improve user experience highlights some Schemas for auto-complete and Spell Check.
Unique Id
Do we need a unique id for the index data? To answer this question, we need figure out anwsers for the following questions:
- Is any API depends on this?
- How to update the index for an existing index? Through the API.
Understanding Solr Config
solrconfig.xml will set up the following things:
- Data Directory Location
- Cache Parameters
- Request Handlers
- Search Components
A quick tutorial: solrconfig.xml.
Solr Client
Application need solr client to do the following work:
- connect to solr server.
- add/commit doc to solr server.
- query/search docs from solr server.
SolrJ is the Java client, we also need PHP client, Python Client ...
Solr APIs
Nutch Crawler
http://nutch.apache.org
Data Import Handler
http://wiki.apache.org/solr/DataImportHandler