Postgres/MySQL/MariaDB to Elasticsearch/OpenSearch sync
Project description
PostgreSQL/MySQL/MariaDB to Elasticsearch/OpenSearch sync
- PGSync is a middleware for syncing data from Postgres to Elasticsearch/OpenSearch or OpenSearch.
- It allows you to keep Postgres as your source of truth data source and expose structured denormalized documents in Elasticsearch/OpenSearch.
Requirements
- Python 3.9+
- Postgres 9.6+ or MySQL 8.0.0+ or MariaDB 12.0.0+
- Redis 3.1.0+ or Valkey 7.2.0+
- Elasticsearch 6.3.1+ or OpenSearch 1.3.7+
- SQLAlchemy 1.3.4+
Postgres Setup
Enable logical decoding in your Postgres setting.
-
You also need to set up two parameters in your Postgres config postgresql.conf
wal_level = logicalmax_replication_slots = 1
MySQL / MariaDB setup
-
Enable binary logging in your MySQL / MariaDB setting.
-
You also need to set up the following parameters in your MySQL / MariaDB config my.cnf, then restart the database server.
server-id = 1# any non-zero unique IDlog_bin = mysql-binbinlog_row_image = FULL# recommended; if not supported on older MariaDB, omit -
optional housekeeping:
binlog_expire_logs_seconds = 604800# 7 days -
You need to create a replication user with REPLICATION SLAVE and REPLICATION CLIENT privileges
CREATE USER 'replicator'@'%' IDENTIFIED WITH mysql_native_password BY 'password'; GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'%'; FLUSH PRIVILEGES;
Installation
You can install PGSync from PyPI:
$ pip install pgsync
Config
Create a schema for the application named e.g schema.json
Example spec
.. code-block::
[
{
"database": "[database name]",
"index": "[Elasticsearch or OpenSearch index]",
"nodes": {
"table": "[table A]",
"schema": "[table A schema]",
"columns": [
"column 1 from table A",
"column 2 from table A",
... additional columns
],
"children": [
{
"table": "[table B with relationship to table A]",
"schema": "[table B schema]",
"columns": [
"column 1 from table B",
"column 2 from table B",
... additional columns
],
"relationship": {
"variant": "object",
"type": "one_to_many"
},
...
},
{
... additional children
}
]
}
}
]
Environment variables
Setup environment variables required for the application
SCHEMA='/path/to/schema.json'
ELASTICSEARCH_HOST=localhost
ELASTICSEARCH_PORT=9200
PG_HOST=localhost
PG_USER=i-am-root # this must be a postgres superuser or replication user
PG_PORT=5432
PG_PASSWORD=*****
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_DB=0
REDIS_AUTH=*****
Running
Bootstrap the database (one time only)
- $ bootstrap --config schema.json
Run pgsync as a daemon
- $ pgsync --config schema.json --daemon
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pgsync-7.0.0.tar.gz.
File metadata
- Download URL: pgsync-7.0.0.tar.gz
- Upload date:
- Size: 138.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3c3723dd8f91014439de06989c1f42adcc549cfd7a6e9b4134346ee4a26f416
|
|
| MD5 |
7c9eeb424229cd2863c27f987d90d2f8
|
|
| BLAKE2b-256 |
c4397201ee1b079a8a40ebf280e89ca4a1ce6941af39988e8ad289e48342d87a
|
File details
Details for the file pgsync-7.0.0-py3-none-any.whl.
File metadata
- Download URL: pgsync-7.0.0-py3-none-any.whl
- Upload date:
- Size: 79.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85c78afa0247b223ccd040d48466bda364773a3767bacf5042bbf8da33ae153e
|
|
| MD5 |
a70fc0294a797ea474f74020b0a3a790
|
|
| BLAKE2b-256 |
9463f30262c4e8383eb56db4c49a64a2c5bad79b59d1a4d308993df29a06c65d
|