Skip to main content

zc.buildout to configure a solr instance

Project description

The recipe configures an instance of the Solr indexing server. Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface

Git Repository and issue tracker: https://github.com/collective/collective.recipe.solrinstance

Note: This version of the recipe only supports Solr 3.5. Please use a release from the 2.x series if you are using Solr 1.4.

Supported options

The recipe supports the following options:

solr-location
Path to the location of the Solr installation. This should be the top-level installation directory.
host
Name or IP address of the Solr server, e.g. some.server.com. Defaults to ‘localhost’.
port
Server port. Defaults to 8983.
basepath

Base path to the Solr service on the server. The final URL to the Solr service will be made of

$host:$port/$basepath

to which the actual commands will be appended. Defaults to ‘/solr’.

config-destination
Optional override for the directory where the solrconfig.xml file will be generated. Defaults to the Solr default location.
config-template
Optional override for the template used to generate the solrconfig.xml file. Defaults to the template contained in the recipe, i.e. templates/solrconfig.xml.tmpl.
jetty-template
Optional override for the jetty.xml template. Defaults to templates/jetty.xml.tmpl.
logging-template
Optional override for the logging.properties template. Defaults to templates/logging.properties.tmpl.
schema-destination
Optional override for the directory where the schema.xml file will be generated. Defaults to the Solr default location.
schema-template
Optional override for the template used to generate the schema.xml file. Defaults to the template contained in the recipe, i.e. templates/schema.xml.tmpl.
stopwords-template
Optional override for the template used to generate the stopwords.txt file. Defaults to the template contained in the recipe, i.e. templates/stopwords.txt.tmpl.
jetty-destination
Optional override for the directory where the jetty.xml file will be generated. Defaults to the Solr default location.
extra-field-types
Configure the extra field types available to be used in the index option. You can create custom field types with special analysers and tokenizers, check Solr’s complete reference: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
filter
Configure the additional filters for the default field types. Each filter is configured on a separated line. Each line contains a index params pair, where index is one of the existing index types and params contains [key]:[value] items to configure the filter. Check the available filters in Solr’s docs: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#TokenFilterFactories
index

Configures the different types of index fields provided by the Solr instance. Each field is configured on a separated line. Each line contains a white-space separated list of [key]:[value] pairs which define options associated with the index. Common field options are detailed at http://wiki.apache.org/solr/SchemaXml#Common_field_options and are illustrated in following examples.

A special [key]:[value] pair is supported here for supporting Copy Fields; if you specify copyfield:dest_field, then a <copyField> declaration will be included in the schema that copies the given field into that of dest_field.

unique-key
Optional override for declaring a field to be unique for all documents. See http://wiki.apache.org/solr/SchemaXml for more information Defaults to ‘uid’.
default-search-field
Configure a default search field, which is used when no field was explicitly given. See http://wiki.apache.org/solr/SchemaXml.
max-num-results
The maximum number of results the Solr server returns. Defaults to 500.
section-name
Name of the product-config section to be generated for zope.conf. Defaults to ‘solr’.
zope-conf

Optional override for the configuration snippet that is generated to be included in zope.conf by other recipes. Defaults to:

<product-config ${part:section-name}>
    address ${part:host}:${part:port}
    basepath ${part:basepath}
</product-config>
default-operator
The default operator to use for queries. Valid values or AND and OR. Defaults to OR.
additional-solrconfig
Optional additional configuration to be included inside the solrconfig.xml. For instance, <requestHandler /> directives.
additional-schema-config
Optional additional configuration to be included inside the schema.xml. For instance, custom <copyField /> directives and anything else that’s part of the schema configuration (see http://wiki.apache.org/solr/SchemaXml).
maxWarmingSearchers
Maximum number of searchers that may be warming in the background. Defaults to 4. For read-only slaves recommend to set to 1 or 2.
useColdSearcher
If a request comes in without a warm searcher available, immediately use one of the warming searchers to handle the request. Defaults to false.
mergeFactor
Specify the index defaults merge factor. This value determines how many segments of equal size exist before being merged to a larger segment. With the default of 10, nine segments of 1000 documents will be created before they are merged into one containing 10000 documents, which in turn will be merged into one containing 100000 documents once that size is reached.
ramBufferSizeMB
Sets the amount of RAM that may be used by Lucene indexing for buffering added documents and deletions before they are flushed to the directory. Defaults to 16mb.
unlockOnStartup
If true (the recipes default), unlock any held write or commit locks on startup. This defeats the locking mechanism that allows multiple processes to safely access a Lucene index.
spellcheckField
Configures the field used as a source for the spellcheck search component. Defaults to default.
autoCommitMaxDocs
Let’s you enable auto commit handling and force a commit after at least the number of documents were added. This is disabled by default.
autoCommitMaxTime
Let’s you enable auto commit handling after a specified time in milli seconds. This is disabled by default.
requestParsers-multipartUploadLimitInKB
Optional <requestParsers /> parameter useful if you are submitting very large documents to Solr. May be the case if Solr is indexing binaries extracted from request.
vardir
Optional override for the location of the directory where Solr stores its indexes and log files. Defaults to ${buildout:directory}/var/solr. This option and the script option make it possible to create multiple Solr instances in a single buildout and dedicate one or more of the instances to automated functional testing.
logdir
Optional override for the location of the Solr logfiles. Defaults to ${buildout:directory}/var/solr.
extralibs

Optional includes of custom Java libraries. The option takes a path and a regular expression per line seperated by a colon. The regular expression is optional and defaults to .*.jar (all jar-files in a directory). Example:

extralibs =
    /my/global/java/path
    some/special/libs:.*\.jarx
script
Optional override for the name of the generated Solr instance control script. Defaults to solr-instance. This option and the vardir option make it possible to create multiple Solr instances in a single buildout and dedicate one or more of the instances to automated functional testing.
java_opts

Optional. Parameters to pass to the Java Virtual Machine (JVM) used to run Solr. Each option is specified on a separated line. For example:

[solr-instance]
...
java_opts =
  -Xms512M
  -Xmx1024M
...
cores
Optional. If collective.recipe.solrinstance:mc is specified for every section in cores a multicore solr instance is created with it’s own configuration.
default-core-name
Optional. If collective.recipe.solrinstance:mc is specified as the recipe, then this option controls which core is set as the default for incoming requests that do not specify a core name. This corresponds to the defaultCoreName option described at http://wiki.apache.org/solr/CoreAdmin#cores.

Cache options

Fine grained control of query caching as described at http://wiki.apache.org/solr/SolrCaching.

The supported options are:

  • filterCacheSize
  • filterCacheInitialSize
  • filterCacheAutowarmCount
  • queryResultCacheSize
  • queryResultCacheInitialSize
  • queryResultCacheAutowarmCount
  • documentCacheSize
  • documentCacheInitialSize

Examples single solr

A simple example how a single solr could look like:

[buildout]
parts = solr-download
        solr

[solr-download]
recipe = hexagonit.recipe.download
strip-top-level-dir = true
url = http://mirrorservice.nomedia.no/apache.org//lucene/solr/3.5.0/apache-solr-3.5.0.zip

[solr]
recipe = collective.recipe.solrinstance
solr-location = ${solr-download:location}
host = 127.0.0.1
port = 1234
max-num-results = 500
section-name = SOLR
unique-key = uniqueID
index =
    name:uniqueID type:string indexed:true stored:true required:true
    name:Foo type:text copyfield:Baz
    name:Bar type:date indexed:false stored:false required:true multivalued:true omitnorms:true copyfield:Baz
    name:Foo bar type:text
    name:Baz type:text
    name:Everything type:text
filter =
    text solr.LowerCaseFilterFactory
additional-schema-config =
    <copyField source="*" dest="Everything"/>

Example multicore solr

To get multicore working it is needed to use collective.recipe.solrinstance:mc recipe. A simple example how a multicore solr could look like:

[buildout]
parts = solr-download
        solr-mc

[solr-download]
recipe = hexagonit.recipe.download
strip-top-level-dir = true
url = http://mirrorservice.nomedia.no/apache.org//lucene/solr/3.5.0/apache-solr-3.5.0.zip

[solr-mc]
recipe = collective.recipe.solrinstance:mc
solr-location = ${solr-download:location}
host = 127.0.0.1
port = 1234
section-name = SOLR
cores = core1 core2

[core1]
max-num-results = 99
unique-key = uniqueID
index =
    name:uniqueID type:string indexed:true stored:true required:true
    name:Foo type:text copyfield:Baz
    name:Bar type:date indexed:false stored:false required:true multivalued:true omitnorms:true copyfield:Baz
    name:Foo bar type:text
    name:Baz type:text
    name:Everything type:text
filter =
    text solr.LowerCaseFilterFactory
additional-schema-config =
    <copyField source="*" dest="Everything"/>

[core2]
max-num-results = 66
unique-key = uid
index =
    name:uid type:string indexed:true stored:true required:true
    name:La type:text
    name:Le type:date indexed:false stored:false required:true multivalued:true omitnorms:true
    name:Lau type:text
filter =
    text solr.LowerCaseFilterFactory

Change History

3.8 (2012-08-09)

  • Support default-core-name for specifying the name of a core to use for incoming Solr requests that do not specify a core. See http://wiki.apache.org/solr/CoreAdmin#cores [reinhardt]
  • Add ability to add arbitrary configuration to schema.xml using additional-schema-config option. [davidjb]
  • Add documentation and tests for copyfield option for indexes to test and clarify that this option is available. [davidjb]

3.7.1 (2012-02-28)

  • Fixed package missing files, without a MANIFEST.in we need setuptools-git. [jod]

3.7 (2012-02-28)

  • Fixed tests. [jod]
  • added option abortOnConfigurationError (makes config error diagnostics a lot easier). [gweis]
  • Add support for field options termVectors, termPositions and termOffsets. [gweis]
  • Use parts location to find additional jars. [gweis]
  • Copy dist and contrib folder for Multicore setup (just like for Singlecore). [gweis]
  • Diabled elevate.xml`, solar would fail to work if this is enabled. [gweis]

3.6 (2011-12-07)

  • Account for new schema validation in Solr 3.4 related to omitNorms field. [hannosch]
  • Update generated config files to match and require Solr 3.5. [hannosch]
  • Fix solr-instance purge to work with hosts/ports other than localhost:8983 [csenger]
  • Added new extralibs option to include custom Java libraries

3.5 (2011-07-10)

  • Removed the cacheSize option in favor of 8 specific options to configure every aspect of the query caches on their own. [hannosch]
  • Added new spellcheckField option, to configure the source field for the spellcheck search component. [hannosch]
  • Removed the example tvrh, terms and elevate request handlers. [hannosch]
  • Removed the example spell request handler and enabled spell checking based on the default field for the search request handler. [hannosch]
  • Clean up solrconfig template and remove an example firstSearcher query. [hannosch]
  • Added new mergeFactor, ramBufferSizeMB, unlockOnStartup options. [hannosch]

3.4 (2011-07-09)

  • Update generated config files to match and require Solr 3.3. [hannosch]
  • Add solr.WordDelimiterFilterFactory to the standard text field, to split on intra-word delimiters such as -_:. [hannosch]

3.3 (2011-06-25)

  • Increase the requestParsers-multipartUploadLimitInKB default value from 2mb to 100mb to allow the update/extract handler to accept large files. [hannosch]
  • Increase Jetty’s maxFormContentSize from 1mb to 100mb to allow indexing large files. [hannosch]
  • Changed the field definition of the text type to avoid filters specific to the English language and instead use a default filter config that should work with most languages, based on the ICU tokenizer and folding filter. [hannosch]
  • Load the analysis-extras libraries, so we can use the ICU-based filters and tokenizers. [hannosch]
  • Removed the clustering request handlers from the default config, as they didn’t work anyways without us loading the contrib/clustering libraries. [hannosch]
  • Enable Tika data extraction and Solr Cell libraries. Data is extracted into a field called tika_content unless specified otherwise in each request via the fmap.content= argument. All extracted fields which aren’t in the schema are put into dynamic fields prefixed with tika_. [tom_gross, hannosch]
  • Removed the Velocity driven /browse request handler. The example config we generated didn’t match the schema. [hannosch]

3.2 (2011-06-23)

  • Added a new option stopwords-template which allows you to specify a custom stopwords file. [hannosch]

3.1 (2011-06-06)

  • Updated templates to match default found in Solr 3.2. [hannosch]

3.0 (2011-06-04)

  • We no longer require elementtree. [hannosch]
  • Use the standard libraries doctest module. [hannosch]
  • Increase the max-num-results default value from 10 to 500 to avoid restricting search results on this low level. The application layer should be responsible for making such restrictions. [hannosch]

3.0a2 (2011-05-26)

  • Added new logging-template option and instruct Jetty to use the logging.properties file. The default logging level is set to WARNING. [hannosch]
  • Pass the host option to the Jetty config, so it can be configured to listen only on localhost or a specific IP. [hannosch]
  • Disabled Jetty request log. [hannosch]
  • Updated jetty.xml template to match new defaults found in the Solr 3.1 release. [hannosch]
  • Fixed syntax error introduced around httpCaching directive. [hannosch]

3.0a1 (2011-05-26)

  • Updated the solrconfig.xml template to match the template from Solr 3.1. [hannosch]

  • Updated the default schema.xml to the Solr 3.1 format. The schema version is now 1.3 instead of 1.2. The schema is no longer compatible with Solr 1.4. Please use a recipe version from the 2.x series for that.

    Changes to the schema include:

    • Fields no longer have a compressed option.
    • The default schema defines three new field types: point, location and geohash useful for geospatial data.

    If you have an older Solr 1.4 index, you should be able to continue using it without a full reindex. [hannosch]

2.1 (2011-04-12)

  • Fixed reStructuredText. [jod]

2.0 (2011-04-12)

  • Added default to filter attributes. [jod]

  • Multicore recipe collective.recipe.solrinstance:mc. [jod]

    • Refactured to get multicore working.
    • Pinned buildout version to get tests working.

1.1 (2011-04-04)

  • Make jetty.xml.tmpl honor the host parameter. [davidblewett]
  • Support for Windows [bluszcz]

1.0 (2010-12-12)

  • No changes.

1.0b5 (2010-09-03)

  • Actually provide the default value for the cacheSize option. [hannosch]

1.0b4 (2010-08-12)

  • Added jetty-template option. [ajung]

1.0b3 (2010-07-23)

  • Don’t kill solr after script finish when script is just used for starting solr as a daemon [do3cc]

1.0b2 (2010-06-01)

  • Actually do something in the update call. Now the configuration is updated when you run buildout again. [fschulze]
  • Handle termination signal in the wrapper script, so the solr instance is killed when the wrapper dies. [fschulze]

1.0b1 (2010-05-25)

  • Added new autoCommitMaxDocs and autoCommitMaxTime options. [hannnosch]
  • logdir option internal bugfix: buildout does not allow None options values (__setitem__). [anguenot]

1.0a7 (2010-05-17)

  • Fixed syntax error in new logdir code. [ajung]

1.0a6 (2010-05-17)

  • Added logdir option. [ajung]

1.0a5 (2010-05-11)

  • Added more options: maxWarmingSearchers, useColdSearcher and cacheSize. [hannosch]

1.0a4 (2010-05-05)

  • Added back JMX configuration. See http://wiki.apache.org/solr/SolrJmx for more details. You can enable it by adding -Dcom.sun.management.jmxremote to the java_opts option. [hannosch]

1.0a3 (2010-03-23)

  • Added back a field type called integer with the same properties as the int type. This ensures basic schemas created by collective.solr won’t need any schema changes, though they still need a full reindex. [hannosch]

1.0a2 (2010-03-22)

  • Fixed invalid reStructuredText format in the changelog. [hannosch]

1.0a1 (2010-03-22)

  • Replaced the gettableFiles option in the admin section with the new *.admin.ShowFileRequestHandler approach. By default your entire SOLR_HOME/conf except for the scripts.conf is exposed. [hannosch]

  • Updated the default schema.xml to the Solr 1.4 format. The schema version is now 1.2 instead of 1.1. The schema is no longer compatible with Solr 1.3. Please use a recipe version from the 0.x series for that.

    Changes to the schema include:

    • The integer field is now called int.
    • New field type attribute omitTermFreqAndPositions introduced. This is true by default except for text fields.
    • New binary and random field types.
    • The int, float, long, double and date fields now use the solr.Trie* classes. These are more efficient in general.
    • New tint, tfloat, tlong, tdouble and tdate fields. These are solr.Trie* fields with a precisionStep configured. You can use them for fields that see a lot of range queries.
    • The old sint, slong, sfloat and sdouble fields are no longer configured.
    • The examples fields text_greek, textTight and alphaOnlySort are no longer configured by default.
    • The text field uses the SnowballPorterFilterFactory with a language of English instead of the EnglishPorterFilterFactory.
    • The ignored field is now multiValued.
    • No dynamic fields are configured by default.

    If you have an older Solr 1.3 configuration, you might need to adjust it to match some of the new defaults. You will also have to do a full reindex of Solr, if the type of any of the fields changed, like with int or date fields. [hannosch]

  • Simplify solrconfig.xml and unconfigure example handlers that rely on a specific schema. Other changes include:

    • Indexes are now flushed when the ramBufferSizeMB is exceeded, defaulting to 32mb instead of every 1000 documents. The maxBufferedDocs is deprecated.
    • The new reopenReaders option causes IndexReaders to be reopened instead of closed and then opened.
    • The filterCache uses the solr.FastLRUCache instead of the solr.LRUCache.
    • The queryResultWindowSize defaults to 30 instead of 10.
    • The requestHandler use the new solr.SearchHandler, which supports a defType argument to turn it into a dismax handler, instead of having two separate classes for the two handlers.

    There is a number of new handlers in Solr 1.4, which aren’t enabled by default. Read the Solr documentation for the examples. [hannosch]

  • Updated jetty.xml and solrconfig.xml to Solr 1.4 defaults. The *.jetty.Request.maxFormContentSize has been set to allow post request of 1mb by default. [hannosch]

  • Made the tests pass again, by installing more packages into the test buildout environment. [hannosch]

0.4 (2010-02-18)

  • Some package metadata cleanup. [hannosch]
  • Added optional java_opts parameter to pass to the Java Virtual Machine (JVM) used to run Solr. [anguenot]
  • Fixed to create the solr.log file inside the log folder. [deo]
  • Made sure to display the invalid index attribute name when raising the related error. [deo]
  • Added support for defining custom field types. [deo]
  • Added a restart command to the solr instance control script. [deo]

0.3 (2009-09-10)

  • Added requestParsers-multipartUploadLimitInKB allowing one to adjust the request parsers limit. [anguenot]
  • Added additional-solrconfig allowing one to extend the solrconfig.xml. [anguenot]
  • Support whitespace in schema index attributes values. [anguenot]
  • Added default-operator. [swampmonkey]
  • Added config-template for allowing an alternate template to be used for generating the solrconfig.xml file. [cguardia]
  • Added the vardir and script options, making it possible to install multiple Solr instances in a single buildout. [hathawsh]

0.2 (2008-08-08)

  • Improved stop command by using SIGTERM instead of SIGHUP. [guido_w]
  • Made that stdout and stderr get redirected to a log file when daemonizing the solr instance. [guido_w]
  • Added support for setting Solr filters. [deo]

0.1 (2008-07-07)

  • First public release. [dokai]

Contributors

  • Andreas Zeidler
  • Carlos de la Guardia
  • Dorneles Tremea
  • Florian Schulze
  • Guido Wesdorp
  • Hanno Schlichting
  • Jan Murre
  • Joshua LaPlace
  • Julien Anguenot
  • Kai Lautaportti
  • Shane Hathaway
  • Tarek Ziade
  • Tom Gross
  • Andreas Jung
  • David Blewett
  • Josip Delic
  • Carsten Senger
  • Gerhard Weis

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
collective.recipe.solrinstance-3.8.zip (61.1 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page