Connect PostgreSQL and Elastic Search with this Foreign Data Wrapper
Project description
PostgreSQL Elastic Search foreign data wrapper
This allows you to index data in Elastic Search and then search it from PostgreSQL. You can write as well as read.
SYNOPSIS
Supported Versions
Elastic Search | Dependency Installation Command |
---|---|
5 | sudo pip install "elasticsearch>=5,<6" |
6 | sudo pip install "elasticsearch>=6,<7" |
7 | sudo pip install "elasticsearch>=7,<8" |
PostgreSQL | Dependency Installation Command |
---|---|
9.4 | sudo apt-get install postgresql-9.4-python-multicorn |
9.5 | sudo apt-get install postgresql-9.5-python-multicorn |
9.6 | sudo apt-get install postgresql-9.6-python-multicorn |
10 | sudo apt-get install postgresql-10-python-multicorn |
11 | sudo apt-get install postgresql-11-python-multicorn |
12 | sudo apt-get install postgresql-12-python3-multicorn |
Please note that the Debian package for Multicorn on PostgreSQL 12 requires Python 3.
Installation
This requires installation on the PostgreSQL server, and has system level dependencies. You can install the dependencies with:
sudo apt-get install python python-pip
You should install the version of multicorn that is specific to your postgres version. See the table in Supported Versions for installation commands. The multicorn package is also only available from Ubuntu Xenial (16.04) onwards. If you cannot install multicorn in this way then you can use pgxn to install it.
This uses the Elastic Search client which has release versions that correspond
to the major version of the Elastic Search server. You should install the
elasticsearch
dependency separately. See the table in Supported Versions
for installation commands.
Once the dependencies are installed you can install the foreign data wrapper using pip:
sudo pip install pg_es_fdw
Usage
A running configuration for this can be found in the docker-compose.yml
within this folder.
The basic steps are:
- Load the extension
- Create the server
- Create the foreign table
- Populate the foreign table
- Query the foreign table...
Load extension and Create server
CREATE EXTENSION multicorn;
CREATE SERVER multicorn_es FOREIGN DATA WRAPPER multicorn
OPTIONS (
wrapper 'pg_es_fdw.ElasticsearchFDW'
);
Create the foreign table
CREATE FOREIGN TABLE articles_es
(
id BIGINT,
title TEXT,
body TEXT,
metadata JSON,
query TEXT,
score NUMERIC
)
SERVER multicorn_es
OPTIONS
(
host 'elasticsearch',
port '9200',
index 'article-index',
type 'article',
rowid_column 'id',
query_column 'query',
score_column 'score',
timeout '20',
username 'elastic',
password 'changeme'
)
;
Elastic Search 7 and greater does not require the type
option, which
corresponds to the doc_type
used in prior versions of Elastic Search.
This corresponds to an Elastic Search index which contains a title
and body
fields. The other fields have special meaning:
- The
id
field is mapped to the Elastic Search document id - The
query
field accepts Elastic Search queries to filter the rows - The
score
field returns the score for the document against the query - The
timeout
field specifies the connection timeout in seconds - The
username
field specifies the basic auth username used - The
password
field specifies the basic auth password used
These are configured using the rowid_column
, query_column
,
score_column
, timeout
, username
and password
options.
All of these are optional.
To use basic auth you must provide both a username and a password, even if the password is blank.
JSON and JSONB
When elasticsearch returns nested data it is serialized to TEXT as json before being returned. This means you can create columns with JSON or JSONB types and the data will be correctly converted on read. If you write to a JSON or JSONB column then the data is passed to elasticsearch as json.
As the data is converted on the fly per query the benefits of using JSONB over JSON are limited.
Elastic Search Authentication
Currently basic auth is supported for authentication.
You can provide the username and password by setting the username
and password
options when creating the table.
You must provide both, even if the password is blank.
If you do not provide them then basic auth is disabled for the table.
If you need to use other forms of authentication then please open an issue.
Populate the foreign table
INSERT INTO articles_es
(
id,
title,
body,
metadata
)
VALUES
(
1,
'foo',
'spike',
'{"score": 3}'::json
);
It is possible to write documents to Elastic Search using the foreign data wrapper. This feature was introduced in PostgreSQL 9.3.
Query the foreign table
To select all documents:
SELECT
id,
title,
body,
metadata
FROM
articles_es
;
To filter the documents using a query:
SELECT
id,
title,
body,
metadata,
score
FROM
articles_es
WHERE
query = 'body:chess'
;
This uses the URI Search from Elastic Search.
Caveats
Elastic Search does not support transactions, so the elasticsearch index is not guaranteed to be synchronized with the canonical version in PostgreSQL. Unfortunately this is the case even for serializable isolation level transactions. It would however be possible to check against Elastic Search version field and locking.
Rollback is currently not supported.
Tests
There are end to end tests that use docker to create a PostgreSQL and Elastic Search database. These are then populated with data and tests are run against them.
These require docker and docker-compose. These also require python packages which you can install with:
pip install -r tests/requirements.txt
The makefile will test all versions if you run make test
:
➜ make test
poetry run tests/run.py --pg 9.4 9.5 9.6 10 11 12 --es 5 6 7
Testing PostgreSQL 9.4 with Elasticsearch 5
PostgreSQL 9.4 with Elasticsearch 5: Test read - PASS
PostgreSQL 9.4 with Elasticsearch 5: Test nested-read - PASS
PostgreSQL 9.4 with Elasticsearch 5: Test query - PASS
Testing PostgreSQL 9.4 with Elasticsearch 6
PostgreSQL 9.4 with Elasticsearch 6: Test read - PASS
PostgreSQL 9.4 with Elasticsearch 6: Test nested-read - PASS
PostgreSQL 9.4 with Elasticsearch 6: Test query - PASS
Testing PostgreSQL 9.4 with Elasticsearch 7
PostgreSQL 9.4 with Elasticsearch 7: Test read - PASS
PostgreSQL 9.4 with Elasticsearch 7: Test nested-read - PASS
PostgreSQL 9.4 with Elasticsearch 7: Test query - PASS
Testing PostgreSQL 9.5 with Elasticsearch 5
PostgreSQL 9.5 with Elasticsearch 5: Test read - PASS
PostgreSQL 9.5 with Elasticsearch 5: Test nested-read - PASS
PostgreSQL 9.5 with Elasticsearch 5: Test query - PASS
Testing PostgreSQL 9.5 with Elasticsearch 6
PostgreSQL 9.5 with Elasticsearch 6: Test read - PASS
PostgreSQL 9.5 with Elasticsearch 6: Test nested-read - PASS
PostgreSQL 9.5 with Elasticsearch 6: Test query - PASS
Testing PostgreSQL 9.5 with Elasticsearch 7
PostgreSQL 9.5 with Elasticsearch 7: Test read - PASS
PostgreSQL 9.5 with Elasticsearch 7: Test nested-read - PASS
PostgreSQL 9.5 with Elasticsearch 7: Test query - PASS
Testing PostgreSQL 9.6 with Elasticsearch 5
PostgreSQL 9.6 with Elasticsearch 5: Test read - PASS
PostgreSQL 9.6 with Elasticsearch 5: Test nested-read - PASS
PostgreSQL 9.6 with Elasticsearch 5: Test query - PASS
Testing PostgreSQL 9.6 with Elasticsearch 6
PostgreSQL 9.6 with Elasticsearch 6: Test read - PASS
PostgreSQL 9.6 with Elasticsearch 6: Test nested-read - PASS
PostgreSQL 9.6 with Elasticsearch 6: Test query - PASS
Testing PostgreSQL 9.6 with Elasticsearch 7
PostgreSQL 9.6 with Elasticsearch 7: Test read - PASS
PostgreSQL 9.6 with Elasticsearch 7: Test nested-read - PASS
PostgreSQL 9.6 with Elasticsearch 7: Test query - PASS
Testing PostgreSQL 10 with Elasticsearch 5
PostgreSQL 10 with Elasticsearch 5: Test read - PASS
PostgreSQL 10 with Elasticsearch 5: Test nested-read - PASS
PostgreSQL 10 with Elasticsearch 5: Test query - PASS
Testing PostgreSQL 10 with Elasticsearch 6
PostgreSQL 10 with Elasticsearch 6: Test read - PASS
PostgreSQL 10 with Elasticsearch 6: Test nested-read - PASS
PostgreSQL 10 with Elasticsearch 6: Test query - PASS
Testing PostgreSQL 10 with Elasticsearch 7
PostgreSQL 10 with Elasticsearch 7: Test read - PASS
PostgreSQL 10 with Elasticsearch 7: Test nested-read - PASS
PostgreSQL 10 with Elasticsearch 7: Test query - PASS
Testing PostgreSQL 11 with Elasticsearch 5
PostgreSQL 11 with Elasticsearch 5: Test read - PASS
PostgreSQL 11 with Elasticsearch 5: Test nested-read - PASS
PostgreSQL 11 with Elasticsearch 5: Test query - PASS
Testing PostgreSQL 11 with Elasticsearch 6
PostgreSQL 11 with Elasticsearch 6: Test read - PASS
PostgreSQL 11 with Elasticsearch 6: Test nested-read - PASS
PostgreSQL 11 with Elasticsearch 6: Test query - PASS
Testing PostgreSQL 11 with Elasticsearch 7
PostgreSQL 11 with Elasticsearch 7: Test read - PASS
PostgreSQL 11 with Elasticsearch 7: Test nested-read - PASS
PostgreSQL 11 with Elasticsearch 7: Test query - PASS
Testing PostgreSQL 12 with Elasticsearch 5
PostgreSQL 12 with Elasticsearch 5: Test read - PASS
PostgreSQL 12 with Elasticsearch 5: Test nested-read - PASS
PostgreSQL 12 with Elasticsearch 5: Test query - PASS
Testing PostgreSQL 12 with Elasticsearch 6
PostgreSQL 12 with Elasticsearch 6: Test read - PASS
PostgreSQL 12 with Elasticsearch 6: Test nested-read - PASS
PostgreSQL 12 with Elasticsearch 6: Test query - PASS
Testing PostgreSQL 12 with Elasticsearch 7
PostgreSQL 12 with Elasticsearch 7: Test read - PASS
PostgreSQL 12 with Elasticsearch 7: Test nested-read - PASS
PostgreSQL 12 with Elasticsearch 7: Test query - PASS
PASS
If you want to run the tests for specific versions then you can then run the
tests using tests/run.py
. This takes the PostgreSQL version(s) to test using
the --pg
argument and the Elastic Search versions to test with the --es
argument. The currently supported versions of PostgreSQL are 9.4 through to 12.
The currently supported versions of Elastic Search are 5 to 7. You can pass
multiple versions to test it against all of them.
Test Failure Messages
Error starting userland proxy: listen tcp 0.0.0.0:5432: bind: address already in use
You are already running something that listens to 5432. Try stopping your running postgres server:
sudo /etc/init.d/postgresql stop
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
Your system does not have the appropriate limits in place to run a production ready instance of elasticsearch. Try increasing it:
sudo sysctl -w vm.max_map_count=262144
This setting will revert after a reboot.
Migrating from <=0.6.0
In version 0.7.0 the TEXT representation of json objects changed from HSTORE to JSON. If you have been mapping json objects to HSTORE columns then you should change the column type to JSON. The arrow operator exists for json so queries should not need rewriting.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pg-es-fdw-0.7.4.tar.gz
.
File metadata
- Download URL: pg-es-fdw-0.7.4.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.0 CPython/3.7.4 Linux/4.15.0-23-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c7975b20c7ddf33efedb8ed0998215856299e96de01687a0effc043c23cbd19 |
|
MD5 | 381312bd406e251790104d2dd9dabd4b |
|
BLAKE2b-256 | 9db4beb7b1d4626f5381262683138a6c203cbba3fbc427d4489333ee4d70824f |
Provenance
File details
Details for the file pg_es_fdw-0.7.4-py2.py3-none-any.whl
.
File metadata
- Download URL: pg_es_fdw-0.7.4-py2.py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.0 CPython/3.7.4 Linux/4.15.0-23-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e2aa2a09751e99bd40799a4489aa4db9ce93cec602490f286e257741dc121fb |
|
MD5 | 10f3aa40454843efd988d2ebe4a7ddd4 |
|
BLAKE2b-256 | d6d9bd970fc24edb18ad92af7ef90047d7b1cbc2942a7a2da30377c3518ebe04 |