Skip to main content

opensearch integration with CastleCMS and Plone

Project description

wildcard.hps

CastleCMS and Plone integration with OpenSearch

This product was forked from collective.elasticsearch in order to provide integration with OpenSearch instead of ElasticSearch. OpenSearch itself is a fork of ElasticSearch and compatible with, at least, the ES 7.10.x series of releases (at least at opensearch-py 1.1.0). Compatibility may diverge in the future, and while the collective.elasticsearch package will likely try to maintain compatibility with ElasticSearch, wildcard.hps is intended to maintain compatibility with OpenSearch.

Quickstart

First, start up an instance (for official guides, see the opensearch project documentation)

$ docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:latest
$ curl -XGET https://localhost:9200 -u 'admin:admin' -k

Second, setup Plone/CastleCMS:

  1. add wildcard.hps to the eggs section of your buildout
  2. run buildout
  3. restart your instance, using relevant Environment Variables to connect to your opensearch instance
  4. install the 'Wildcard HPS' product
  5. under the 'Wildcard HPS' control panel, click 'Convert Catalog' then 'Rebuild Catalog'

Configuration Settings are passed as environment variables. See the "Configuration" section below for more details.

Overview

This package aims to index all fields the portal_catalog indexes and allows you to delete the Title, Description and SearchableText indexes which can provide significant improvement to performance and RAM usage.

OpenSearch queries are ONLY used when Title, Description and SearchableText text are in the query. Otherwise, Plone's default catalog will be used. This is because Plone's default catalog is faster on normal queries than using OpenSearch.

Configuration

Configuration for OpenSearch connections, and custom index naming, is done through Environment Variables. This allows per-instance customization without the need to modify site data, and allows for many deployments to use the same cluster(s) without needing to do per-site customized index names.

Available Environrment Variable Options:

  • HPS_ZOPE_CONF_PATH
    • path to a zope.conf to get a Zope app instance
    • NOTE: this is only needed for the reindex_hps script that gets installed. See wildcard/hps/scripts/reindex.py.
  • HPS_REINDEX_SCROLL
    • amount of time to tell opensearch to hold a scroll in memory, defaults to '2s'
    • NOTE: this is only useful for the reindex_hps script that gets installed. See wildcard/hps/scripts/reindex.py.
  • HPS_OVERRIDE_LOGGING
    • if present, will tell the reindex_hps script to override the root logging configuration, and print logging to console at INFO level.
  • HPS_FORCE_ENABLE
    • default: no
    • accepted values (all other values are equivalent to False): Yes, True, 1, On
    • will force the "enabled" lookup to be True
  • HPS_INSTANCE_INDEX_PREFIX
    • default: None
    • a string value prepended to index names used by the Plone instances this addon is installed into
  • HPS_INCLUDE_TRASHED_BY_DEFAULT
    • default: no
    • accepted values (all other values are equivalent to False): Yes, True, 1, On
    • will default searchResults to include trashed entries (which are not included by default)
  • HPS_FOCE_EXTERNAL_INDEXES
    • default: None
    • a list of object properties that will be included in the externally index object (IE the indexed object in opensearch)
  • OPENSEARCH_HOSTS
    • default: https://admin:admin@localhost:9200
    • a list of RFC-1738 formated urls. multiple urls can be specified by putting a space between urls.
    • NOTE: for now, the opensearch-py (1.1.0) does not respect the HTTP auth info that is formatted as part of the URL, instead use OPENSEARCH_HTTP_USERNAME and OPENSEARCH_HTTP_PASSWORD to pass the same HTTP auth to each request to any node listed as a host.
  • OPENSEARCH_HTTP_USERNAME
    • default: None
    • a username to use in all connections to any node in the OPENSEARCH_HOSTS list
  • OPENSEARCH_HTTP_PASSWORD
    • default: None
    • a password to use in all connections to any node in the OPENSEARCH_HOSTS list
  • OPENSEARCH_TIMEOUT
    • default connection timeout
  • OPENSEARCH_RETRY_ON_TIMEOUT
    • default: Off
    • accepted values (all other values are equivalent to False): Yes, True, 1, On
    • retry connection to different node when connection fails
  • OPENSEARCH_DISABLE_HOST_INFO_CALLBACK
    • default: False
    • accepted values (all other values are equivalent to False): Yes, True, 1, On
    • if enabled, will effectivly disable all sniffing and force the use of the specific hosts given by OPENSEARCH_HOSTS
  • OPENSEARCH_SNIFF_ON_START
    • default: False
    • accepted values (all other values are equivalent to False): Yes, True, 1, On
    • refresh nodes before doing anything
  • OPENSEARCH_SNIFF_ON_CONNECTION_FAIL
    • default: False
    • accepted values (all other values are equivalent to False): Yes, True, 1, On
    • refresh nodes after a node fails to respond
  • OPENSEARCH_SNIFFER_TIMEOUT
    • default: None
    • refresh node list on this time (in seconds) interval -- note, you may want to not include this value if you want to completely disable sniffing
  • OPENSEARCH_SNIFF_TIMEOUT
    • default: 0.1
    • timeout of sniff request
  • OPENSEARCH_USE_SSL
    • default: False
    • accepted values (all other values are equivalent to False): Yes, True, 1, On
    • connections to OpenSearch will use SSL
  • OPENSEARCH_VERIFY_CERTS
    • default: True
    • accepted values (all other values are equivalent to False): Yes, True, 1, On
    • verify SSL certificates when using SSL connections to OpenSearch
  • OPENSEARCH_SSL_SHOW_WARN
    • default: True
    • accepted values (all other values are equivalent to False): Yes, True, 1, On
    • when verifying SSL certificates is disabled, then a warning will be shown by default
  • OPENSEARCH_CERTS_PATH
    • default: None
    • a path to a directory containing CA Certificates used in SSL verification
  • OPENSEARCH_CLIENT_CERT_PATH
    • default: None
    • a path to a PEM formated SSL client certificate for SSL client auth
  • OPENSEARCH_CLIENT_CERT_KEY --
    • default: None
    • a path to a PEM formated SSL client key for SSL client auth

Compatibility

Only tested with Plone 5 with Dexterity types.

Only compatible with versions of OpenSearch (and ElasticSearch) compatible with the opensearch-py library.

For ElasticSearch integration, see collective.elasticsearch.

State

Support for all index column types is done EXCEPT for the DateRecurringIndex index column type. If you are doing a full text search along with a query that contains a DateRecurringIndex column, it will not work.

Celery support

This package comes with Celery support where all indexing operations will be pushed into celery to be run asynchronously.

Please see instructions for collective.celery to see how this works.

Running tests

First, start an instance of OpenSearch.

Second,

$ virtualenv ./env
$ ./env/bin/pip install -r requirements.txt
$ ./env/bin/buildout -c buildout.cfg
$ ./bin/test

Changelog

1.4.6 (2025-02-12)

  • make scroll configurable for reindex script

1.4.5 (2025-01-10)

  • add explicit env var for disabling collection of host info for nodes during opensearch sniffing
  • explicitly set the sniffer_timeout, sniff_on_start, and sniff_on_connection_fail parameters
  • move fetching connection kwargs to it's own method

1.4.4 (2023-10-11)

  • missing '.items()'

1.4.3 (2023-10-11)

  • handle unicode for index data derived from IAdditionalIndexDataProvider adapters

1.4.2 (2023-05-15)

  • abstract unicode handling code for hook when getting index data, and handle tuples, lists, and dict values

1.4.1 (2023-05-11)

  • handle unicode error and fix bug in hook when getting index data

1.4.0 (2022-11-04)

  • allow a custom prefix to be defined for fetching connection settings from the environment (default to the previous hard-coded 'OPENSEARCH_' value)

1.3.0 (2022-08-17)

  • add HPS_FORCE_EXTERNAL_INDEXES
  • update default set returned when external indexes setting is not configured yet

1.2.1 (2022-06-23)

  • fix some view name's in the control panel templates

1.2.0 (2022-05-25)

  • add HPS_INCLUDE_TRASHED_BY_DEFAULT env for disabling a filter on searchResults from WildcardHPSCatalog (see readme entry for HPS_INCLUDE_TRASHED_BY_DEFAULT)

1.1.1 (2022-05-12)

  • add property on wildcard.hps.opensearch.WildcardHPSCatalog for the instance prefix

1.1.0 (2022-05-12)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wildcard.hps-1.4.6.tar.gz (42.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wildcard.hps-1.4.6-py2-none-any.whl (45.1 kB view details)

Uploaded Python 2

File details

Details for the file wildcard.hps-1.4.6.tar.gz.

File metadata

  • Download URL: wildcard.hps-1.4.6.tar.gz
  • Upload date:
  • Size: 42.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.8.3 requests/2.27.1 setuptools/44.1.1 requests-toolbelt/1.0.0 tqdm/4.64.1 CPython/2.7.18

File hashes

Hashes for wildcard.hps-1.4.6.tar.gz
Algorithm Hash digest
SHA256 e8b03262bc712e7d191f0b3c14373fcc7e8d9e2f0b2898f1fb9bde8870cd7662
MD5 cbd56d097af394ca2038cc1585d56dc7
BLAKE2b-256 2ecaefc699f4108bbe2cc4a34362ff50b9eba9946bf3be4b9210b9091228f52e

See more details on using hashes here.

File details

Details for the file wildcard.hps-1.4.6-py2-none-any.whl.

File metadata

  • Download URL: wildcard.hps-1.4.6-py2-none-any.whl
  • Upload date:
  • Size: 45.1 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.8.3 requests/2.27.1 setuptools/44.1.1 requests-toolbelt/1.0.0 tqdm/4.64.1 CPython/2.7.18

File hashes

Hashes for wildcard.hps-1.4.6-py2-none-any.whl
Algorithm Hash digest
SHA256 b35392f43218641418a62395fa9dd3d08b4d882eff4295308d0c2e79e664765f
MD5 1f922b102e31e4f12edb84e1215f9b39
BLAKE2b-256 2da2e8d543f9a08d1fecf4dbda27c482273c4b19be1683b8873e86fb51a6ed07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page