Personal tools
You are here: Home Leocornus Leocornus Buildout Config Repository. Solr Configuration - Schema, Data Importer, and Nutch Crawler

Solr Configuration - Schema, Data Importer, and Nutch Crawler

Solr Configuration

Schema.xml

Data Importer

and how to map the data importer to Schema?

Understanding Solr Schema

Solr indexing data will filled with docs. Each doc has all kind of fields. The file schema.xml defines the field types and fields. Solr client will create solr doc and fill in each field based on requirement. Solr data import handler has a configuration file to map database column to solr doc field.

A quick tutorial for Schema xml

SchemaXML Wiki: http://wiki.apache.org/solr/SchemaXml.

SchemaDesign Wiki offers some general tips for designing Schema.

Using solr in online travel to improve user experience highlights some Schemas for auto-complete and Spell Check.

Unique Id

Do we need a unique id for the index data? To answer this question, we need figure out anwsers for the following questions:

  • Is any API depends on this?
  • How to update the index for an existing index? Through the API.

Understanding Solr Config

solrconfig.xml will set up the following things:

  • Data Directory Location
  • Cache Parameters
  • Request Handlers
  • Search Components

A quick tutorial: solrconfig.xml.

Solr Client

Application need solr client to do the following work:

  • connect to solr server.
  • add/commit doc to solr server.
  • query/search docs from solr server.

SolrJ is the Java client, we also need PHP client, Python Client ...

Solr APIs

Nutch Crawler

http://nutch.apache.org

Data Import Handler

http://wiki.apache.org/solr/DataImportHandler

Document Actions