Skip to main content

Postgres/MySQL/MariaDB to Elasticsearch/OpenSearch sync

Project description

PyPI Version Python Versions License Downloads

PostgreSQL/MySQL/MariaDB to Elasticsearch/OpenSearch sync

PGSync is a middleware for syncing data from PostgreSQL, MySQL, or MariaDB to Elasticsearch or OpenSearch.

Keep your relational database as the source of truth and expose structured denormalized documents in your search engine.

Key Features

  • Real-time sync via logical decoding (PostgreSQL) or binary log (MySQL/MariaDB)

  • Denormalize complex relational data into nested search documents

  • JSON schema-based configuration

  • Support for one-to-one, one-to-many relationships

  • Plugin system for document transformation

  • Multiple operation modes: daemon, polling, or direct WAL streaming

Requirements

Installation

Install from PyPI:

pip install pgsync

Database Setup

PostgreSQL

Enable logical decoding in your PostgreSQL configuration (postgresql.conf):

wal_level = logical
max_replication_slots = 1

MySQL / MariaDB

Enable binary logging in your MySQL/MariaDB configuration (my.cnf):

server-id = 1
log_bin = mysql-bin
binlog_row_image = FULL
binlog_expire_logs_seconds = 604800

Create a replication user:

CREATE USER 'replicator'@'%' IDENTIFIED WITH mysql_native_password BY 'password';
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'%';
FLUSH PRIVILEGES;

Configuration

Create a JSON schema file (e.g., schema.json) defining your sync mapping:

[
    {
        "database": "book",
        "index": "book",
        "nodes": {
            "table": "book",
            "columns": ["isbn", "title", "description"],
            "children": [
                {
                    "table": "publisher",
                    "columns": ["name"],
                    "relationship": {
                        "variant": "object",
                        "type": "one_to_one"
                    }
                },
                {
                    "table": "author",
                    "label": "authors",
                    "columns": ["name", "date_of_birth"],
                    "relationship": {
                        "variant": "object",
                        "type": "one_to_many",
                        "through_tables": ["book_author"]
                    }
                }
            ]
        }
    }
]

See the examples directory for more schema examples (airbnb, social, rental, etc.).

Environment Variables

Configure PGSync via environment variables:

# Schema
SCHEMA='/path/to/schema.json'

# PostgreSQL
PG_HOST=localhost
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=*****

# Elasticsearch / OpenSearch
ELASTICSEARCH_HOST=localhost
ELASTICSEARCH_PORT=9200

# Redis (optional in WAL mode)
REDIS_HOST=localhost
REDIS_PORT=6379

Running

Bootstrap (run once to set up triggers and replication slots):

bootstrap --config schema.json

Run as daemon:

pgsync --config schema.json --daemon

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgsync-7.1.0.tar.gz (178.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pgsync-7.1.0-py3-none-any.whl (81.5 kB view details)

Uploaded Python 3

File details

Details for the file pgsync-7.1.0.tar.gz.

File metadata

  • Download URL: pgsync-7.1.0.tar.gz
  • Upload date:
  • Size: 178.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pgsync-7.1.0.tar.gz
Algorithm Hash digest
SHA256 d92950f938ba9f3c16418cf0763131a7ed02a2a293dd383740e0782b7c02db6c
MD5 6715df11413b090e57edae1ed367f44a
BLAKE2b-256 40cbb6b58ee2724dbf5551d2f7451568b802fa8f5d71af536cd1c6bd20ac59f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pgsync-7.1.0.tar.gz:

Publisher: python-publish.yml on toluaina/pgsync

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pgsync-7.1.0-py3-none-any.whl.

File metadata

  • Download URL: pgsync-7.1.0-py3-none-any.whl
  • Upload date:
  • Size: 81.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pgsync-7.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1170381a7aab51ec7189c1ba61d7ac71fca6c940d79d250831a93d9f983bd758
MD5 ea09bb1d060ff35b13bbfba2b853fc73
BLAKE2b-256 94b2910ee7b78090b7a589e2e7329331ac6bf60dcd2d2cd35363a8451d09e343

See more details on using hashes here.

Provenance

The following attestation bundles were made for pgsync-7.1.0-py3-none-any.whl:

Publisher: python-publish.yml on toluaina/pgsync

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page