Skip to main content

Solr integration for external indexing and searching.

Project description

Introduction

collective.solr is an approach to integrate the Solr search engine with Plone. It provides an indexing processor for use with collective.indexing as well as a search API similar to the standard portal catalog. GenericSetup profiles can be applied to set up content indexing in Solr and use it as a backend for Plone’s site and live search facilities.

Current Status

The code is used in production in many sites and considered stable. This add-on can be installed in a Plone 4.x site to enable indexing operations as well as searching (site and live search) using Solr. Doing so will not only significantly improve search performance - especially for a large number of indexed objects, but also reduce the memory footprint of your Plone instance by allowing to remove the SearchableText index from the portal catalog - at least for most sites. A sample buildout is provided for your convenience.

For outstanding issues and features remaining to be implemented please see the to-do list included in the package as well as it’s issue tracker.

Installation

The following buildout configuration may be used to get started quickly:

[buildout]
extends =
  buildout.cfg
  http://svn.plone.org/svn/collective/collective.solr/trunk/buildout/solr-1.4.cfg

[instance]
eggs += collective.solr

After saving this to let’s say solr.cfg buildout can be run and the Solr server and Plone instance started:

$ python bootstrap.py
$ bin/buildout -c solr.cfg
...
$ bin/solr-instance start
$ bin/instance start

Next the “collective.solr (site search)” profile should be applied via the portal setup or when creating a fresh Plone site. After activating and configuring the integration in the Plone control panel and initially indexing any existing content using the provided maintenance view:

http://localhost:8080/plone/@@solr-maintenance/reindex

facet information should appear in Plone’s search results page.

FAQs / Troubleshooting

“AssertionError: cannot use multiple direct indexers; please enable queueing”

Symptom

When activating additional add-ons or applying a GenericSetup profile you get the following error:

AssertionError: cannot use multiple direct indexers; please enable queueing
Problem

Early versions of the add-on used a persistent local utility, which is still present in your ZODB. This utility has meanwhile been replaced so that there are currently two instances present. However, without queued indexing being enabled, only one such indexer is allowed at a time.

Solution

Please re-install the add-on via the quick installer Zope Management Interface. Note that this will reset all your configuration but won’t change any data in Solr.

Searches only return up to 10 results

Symptom

Searches don’t display more than 10 results even though there are more matches and “Maximum search results” is set to “0” (to always return all results).

Problem

With the default setting for “Maximum search results” (i.e. “0”) no rows parameter is included when sending queries to Solr. This results in Solr’s default setting to be applied, and both its internal default (when removing the parameter from solrconfig.xml) as well as the “max-num-results” option in collective.recipe.solrinstance end up with a value of 10.

Solution

Please update your buildout to use a higher setting for “max-num-results”. It should be higher than or equal to the maximum number of total search results you’d like to get from your site. The sample configuration uses a value of “1000”.

Credits

This code was inspired by enfold.solr by Enfold Systems as well as work done at the snowsprint’08. The solr.py module is based on the original python integration package from Solr itself.

Development was kindly sponsored by Elkjop and the Nordic Council and Nordic Council of Ministers.

Changelog

2.0a1 - 2011-01-10

  • Handle utf-8 encoded data correctly in utils.isWildCard. [hannosch]

  • Gracefully handle exceptions raised during index data retrieval. [tom_gross, hannosch]

  • Added zopectl.command entry points for three new scripts. solr_clear_index will remove all entries from Solr. solr_dump_catalog will efficiently dump the content of the catalog onto the filesystem and solr_import_dump will import the dump into Solr. This can be used to bootstrap an empty Solr index or update it when the boost logic has changed. All scripts will either take the first Plone site found in the database or accept an unnamed command line argument to specify the id. The Solr server needs to be running and the connection info needs to be configured in the Plone site. Example use: bin/instance solr_dump_catalog Plone. In this example the data would be stored in var/instance/solr_dump_plone. The data can be transferred between machines and calling solr_dump_catalog multiple times will append new data to the existing dump. To get Solr up-to-date you should still call @@solr-maintenance/sync. [hannosch, witsch]

  • Changed search pattern syntax to use str.format syntax and make both {value} and {base_value} available in the pattern. [hannosch]

  • Add possibility to calculate site-specific boost values via a skin script. [hannosch, witsch]

  • Fix wildcard searches for patterns other than just ending with an asterisk. [hannosch, witsch]

  • Require Plone 4.x, declare package dependencies & remove BBB bits. [hannosch, witsch]

  • Add configurable setting for custom search pattern for simple searches, allowing to include multiple fields with specific boost values. [hannosch, witsch]

  • Don’t modify search parameters during indexing. [hannosch, witsch]

  • Fixed auto-commit support to actually sent the data to Solr, but omit the commit message. [hannosch]

  • Added support for commitWithin support on add messages as per SOLR-793. This feature requires a Solr 1.4 server. [hannosch]

  • Split out 404 auto-suggestion tests into a separate file and disabled them under Plone 4 - the feature is no longer part of Plone. [hannosch]

  • Fixed error handling code to deal with different exception string representations in Python 2.6. [hannosch]

  • Made tests independent of the Large Folder content type, as it no longer exists in Plone 4. [hannosch]

  • Avoid using the incompatible TestRequest from zope.publisher inside Zope 2. [hannosch]

  • Fixed undefined variables in search.pt for Plone 4 compatibility. [hannosch]

1.0 - Released September 14, 2010

  • Enable multi-field “fq” statements. [tesdal, witsch]

  • Prevent logging of “unknown” search attributes for use_solr and the infamous -C Zope startup parameter. [witsch]

1.0rc3 - Released September 9, 2010

  • Add logging of queries without explicit “rows” parameter. [witsch]

  • Add configuration to exclude user from allowedRolesAndUsers for better cacheability. [tesdal, witsch]

  • Add configuration for effective date steps. [tesdal, witsch]

  • Handle python datetime and date objects. [do3cc, witsch]

  • Fixed a grammar error in error.pt. [hannosch]

1.0rc2 - Released August 31, 2010

  • Fix regression about catalog fallback with required, but empty parameters. [tesdal, witsch]

1.0rc1 - Released July 30, 2010

1.0b24 - Released July 29, 2010

  • Fix security issue with getObject on Solr flares, which used unrestricted traversal on the entire path, potentially leading to information leaks. Refs http://plone.org/products/collective.solr/issues/27 [pilz, witsch]

  • Add missing CreationDate method to flares. This fixes http://plone.org/products/collective.solr/issues/16 [witsch]

  • Add logging for slow queries along with the query time as reported by Solr. [witsch]

  • Limit number of matches looked up during live search for speedier replies. [witsch]

  • Renamed the batch parameters to b_start and b_size to avoid conflicts with index names and be consistent with existing template code. [do3cc]

  • Added a new config option auto-commit which is enabled by default. You can disable this, which avoids any explicit commit messages to be sent to the Solr server by the client. You have to configure commit policies on the server side instead. [hannosch]

  • Added support for a special query key use_solr which forces queries to be sent to Solr even though none of the required keys match. This can be used to sent individual catalog queries to Solr. [hannosch]

1.0b23 - Released May 15, 2010

  • Add support for batching, i.e. only fetch and parse items from Solr, which are part of the currently handled batch. [witsch]

  • Fix quoting of operators for multi-word search terms. [witsch]

  • Use the faster C implementations of elementtree/xml.etree if available. [hannosch, witsch]

  • Grant restricted code access to the search results, e.g. skin scripts. [do3cc, witsch]

  • Fix handling of ‘depth’ argument when querying multiple paths. [reinhardt, witsch]

  • Don’t break when filter queries should be used for all parameters. [reinhardt, witsch]

  • Always provide values for all metadata columns like the catalog does. [witsch]

  • Always fall back to portal catalog for “navtree” queries so the set of required query parameters can be empty. This refs http://plone.org/products/collective.solr/issues/18 [reinhardt, witsch]

  • Prevent parsing errors for dates from before 1000 A.D. in combination with 32-bit systems and Solr 1.4. [reinhardt, witsch]

  • Don’t process content with its own indexing methods, e.g. reindexObject, via the reindex maintenance view. [witsch]

  • Let query builder handle sets of possible boolean values as passed by boolean topic criteria for example. [hannosch, witsch]

  • Recognize new solr.TrieDateField field type and handle it in the same way as we handle the older solr.DateField. [hannosch]

  • Warn about missing search indices and non-stored sort parameters. [witsch]

  • Fix issue when reindexing objects with empty date fields. [witsch]

  • Changed the default schema for is_folderish to store the value. The reference browser search expects it on the brain. [hannosch]

  • Changed the GenericSetup export/import handler for the Solr manager to ignore non-persistent utilities. [hannosch]

  • Add support for LinguaPlone. [witsch]

  • Update sample Solr buildout configuration and documentation to recommend a high enough default setting for maximum search results returned by Solr. This refs http://plone.org/products/collective.solr/issues/20 [witsch]

1.0b22 - Released February 23, 2010

  • Split out a BaseSolrConnectionConfig class, to be used for registering a non-persistent connection configuration. [hannosch]

  • Fix bug regarding timeout locking. [witsch]

  • Convert test setup to collective.testcaselayer. [witsch]

  • Only apply timeout decorator when actually committing changes to Solr, also re-enabling the use of query parameters for maintenance views again. [witsch]

  • We also need to change the SearchDispatcher to use the original method in case Solr isn’t active. [hannosch]

  • Changed the searchResults monkey to store and use the method found on the class instead of assuming it comes from the base class. This makes things work with LinguaPlone which also patches this method. [hannosch]

  • Add dutch translation. [WouterVH]

  • Refactor buildout to allow running tests against Plone 4.x. [witsch]

  • Optimize reindex behavior when populating the Solr index for the first time. [hannosch, witsch]

  • Only register indexable attributes the old way on Plone 3.x. [jcbrand]

  • Fix timeout decorator to work ttw. [hannosch, witsch]

  • Add “z3c.autoinclude.plugin” entry point, so in Plone 3.3+ you can avoid loading the ZCML file. [hannosch]

1.0b21 - Released February 11, 2010

  • Fix unindexing to not fetch more data from the objects than necessary. [witsch]

  • Use decorator to lock timeouts and make sure the lock is always released. [witsch]

  • Fix maintenance views to work without setting up a Solr connection first. [witsch]

1.0b20 - Released January 26, 2010

  • Fix reindexing to always provide data for all fields defined in the schema as support for “updateable/modifiable documents” is only planned for Solr 1.5. See https://issues.apache.org/jira/browse/SOLR-139 for more info. [witsch]

  • Fix CSS issues regarding facet display on IE6. [witsch]

1.0b19 - Released January 24, 2010

  • Fix partial reindexing to preserve data for indices that are not stored. [witsch]

  • Help with improved logging of auto-flushes for easier performance tuning. [witsch]

1.0b18 - Released January 23, 2010

  • Work around layout issue regarding facet counts on IE6. [witsch]

1.0b17 - Released January 21, 2010

  • Don’t confuse pre-configured filter queries with facet selections. [witsch]

  • Always display selected facets, even, or especially, without search results. [witsch]

1.0b16 - Released January 11, 2010

  • Remove catalogSync maintenance view since it would need to fetch additional data (for non-stored indices) from the objects themselves in order to work correctly. [witsch]

  • Fix reindex maintenance view to preserve data that cannot be fetched from Solr during partial indexing, i.e. indices that are not stored. [witsch]

  • Use wildcard searches for simple search terms to reflect Plone’s default behaviour. [witsch]

  • Fix drill-down for facet values containing white space. [witsch]

  • Add support for partial syncing of catalog and solr indexes. [witsch]

1.0b15 - Released October 12, 2009

1.0b14 - Released September 17, 2009

  • Fix query builder to use explicit ORs so that it becomes possible to change Solr’s default operator to AND. [witsch]

  • Remove relevance information from search results as they don’t make sense to the user. [witsch]

1.0b13 - Released August 20, 2009

  • Fix reindex and catalogSync maintenance views to not pass invalid data back to Solr when indexing an explicit list of attributes. [witsch]

1.0b12 - Released August 15, 2009

  • Fix reindex maintenance view to keep any existing data when indexing a given list of attributes. [witsch]

  • Add support for facet dependencies: Specifying a facet “foo” like “foo:bar” only makes it show up when a value for “bar” has been previously selected. [witsch]

  • Allow indexer methods to raise AttributeError to prevent an attribute from being indexed. [witsch]

1.0b11 - Released July 2, 2009

  • Fix maintenance view for adding/syncing single indexes using catalog data. [witsch]

  • Allow to configure query parameters for which filter queries should be used (see http://wiki.apache.org/solr/FilterQueryGuidance for more info) [fschulze, witsch]

  • Encode unicode strings when building facet links. [fschulze, witsch]

  • Fix facet display to try to keep the given order of facets. [witsch]

  • Allow facet values to be translated. [witsch]

1.0b10 - Released June 11, 2009

  • Range queries must not be quoted with the new query parser. [witsch]

  • Disable socket timeouts during maintenance tasks. [witsch]

  • Close the response object after searching in order to avoid ResponseNotReady errors triggering duplicate queries. [witsch]

  • Use proper way of accessing jQuery & fix IE6 syntax error. [fschulze]

  • Format relevance value for search results. [witsch]

1.0b9 - Released May 12, 2009

1.0b8 - Released May 4, 2009

1.0b7 - Released April 28, 2009

  • Fix unintended (de)activation of the Solr integration during profile (re)application. [witsch]

  • Fix display of facet information with no active facets. [witsch]

  • Register import and export steps using ZCML. [witsch]

1.0b6 - Released April 20, 2009

  • Add support for facetted searches. [witsch]

  • Update code to comply to PEP8 style guide lines. [witsch]

  • Expose additional information provided by Solr - for example about headers and search facets. [witsch]

  • Handle edge cases like invalid range queries by quoting [tesdal]

  • Parse and quote the query to filter invalid query syntax. [tesdal]

  • In solrSearchResults, if the passed in request is a dict, look up request to enable adaptation into PloneFlare. [tesdal]

  • Added support for objects with a ‘query’ attribute as search values. [tmog]

1.0b5 - Released December 16, 2008

  • Fix and extend logging in “sync” maintenance view. [witsch]

1.0b4 - Released November 23, 2008

  • Filter control characters to prevent indexing errors. This fixes http://plone.org/products/collective.solr/issues/1 [witsch]

  • Avoid using brains when getting all objects from the catalog for sync runs. [witsch]

  • Prefix output from maintenance views with a time-stamp. [witsch]

1.0b3 - Released November 12, 2008

  • Fix url fallback during schema retrieval. [witsch]

  • Fix issue regarding quoting of white space when searching. [witsch]

  • Make indexing operations more robust in case the schema is missing a unique key or couldn’t be parsed. [witsch]

1.0b2 - Released November 7, 2008

  • Make schema retrieval slightly more robust to not let network failures prevent access to the site. [witsch]

1.0b1 - Released November 5, 2008

  • Initial release [witsch]

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.solr-2.0a1.zip (163.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page