Skip to main content

Postgres to elasticsearch sync

Project description

PostgreSQL to Elasticsearch sync

PGSync <https://pgsync.com>_ is a middleware for syncing data from Postgres <https://www.postgresql.org>_ to Elasticsearch <https://www.elastic.co/products/elastic-stack>.
It allows you to keep Postgres <https://www.postgresql.org>
as your source of truth data source and expose structured denormalized documents in Elasticsearch <https://www.elastic.co/products/elastic-stack>_.

Requirements

  • Python <https://www.python.org>_ 3.6+
  • Postgres <https://www.postgresql.org>_ 9.4+
  • Redis <https://redis.io>_
  • Elasticsearch <https://www.elastic.co/products/elastic-stack>_ 6.3.1+

Postgres setup

Enable logical decoding <https://www.postgresql.org/docs/current/logicaldecoding.html>_ in your Postgres setting.

  • you would also need to set up two parameters in your Postgres config postgresql.conf

    wal_level = logical

    max_replication_slots = 1

Installation

You can install PGSync from PyPI <https://pypi.org>_:

$ pip install pgsync

Config

Create a schema for the application named e.g schema.json

Example schema <https://github.com/toluaina/pg-sync/blob/master/examples/airbnb/schema.json>_

Example spec

.. code-block::

[
    {
        "database": "[database name]",
        "index": "[elasticsearch index]",
        "nodes": [
            {
                "table": "[table A]",
                "schema": "[table A schema]",
                "columns": [
                    "column 1 from table A",
                    "column 2 from table A",
                    ... additional columns
                ],
                "children": [
                    {
                        "table": "[table B with relationship to table A]",
                        "schema": "[table B schema]",
                        "columns": [
                          "column 1 from table B",
                          "column 2 from table B",
                          ... additional columns
                        ],
                        "relationship": {
                            "variant": "object",
                            "type": "one_to_many"
                        },
                        ...
                    },
                    {
                        ... any other additional children
                    }
                ]
            }
        ]
    }
]

Environment variables

Setup required environment variables for the application

SCHEMA='/path/to/schema.json'

ELASTICSEARCH_HOST=localhost
ELASTICSEARCH_PORT=9200

PG_HOST=localhost
PG_USER=i-am-root # this must be a postgres superuser
PG_PORT=5432
PG_PASSWORD=*****

REDIS_HOST=redis
REDIS_PORT=6379
REDIS_DB=0
REDIS_AUTH=*****

Running

bootstrap the database (one time only) $ bootstrap --config schema.json run pgsync as a daemon $ pgsync --config schema.json --daemon

======= History

1.0.1 (2020-15-01)

  • First release on PyPI.

1.0.1 (2020-01-01)

  • RC1 release

1.1.0 (2020-04-13)

  • Postgres multi schema support for multi-tennant applications

  • Show resulting Query with verbose mode

  • this release required you to re-bootstrap your database with

    • bootstrap -t
    • bootstrap

1.1.1 (2020-05-18)

  • Fixed authentication with Redis
  • Fixed Docker build

1.1.2 (2020-06-11)

  • Sync multiple indices in the same schema
  • Test for replication or superuser
  • Fix PG_NOTIFY error when payload exceeds 8000 bytes limit

1.1.3 (2020-06-14)

  • Bug fix when syncing multiple indices in the same schema

1.1.4 (2020-06-15)

  • Only create triggers for tables present in schema

1.1.5 (2020-06-16)

  • Bug fix when creating multiple triggers in same schema

1.1.6 (2020-07-31)

  • Bug fix when tearing down secondary schema

1.1.7 (2020-08-16)

  • Fix issue #29: SQLAlchemy err: Neither 'BooleanClauseList' object nor 'Comparator' object has an attribute '_orig'

1.1.8 (2020-08-19)

  • Fix issue #30: Traceback AttributeError: id

1.1.9 (2020-08-26)

  • Fix issue #33: Unable to set redis port via environment variable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pgsync-1.1.9-cp38-cp38-manylinux2010_x86_64.whl (4.2 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

pgsync-1.1.9-cp38-cp38-manylinux2010_x86_64.manylinux1_x86_64.whl (4.2 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

pgsync-1.1.9-cp38-cp38-manylinux1_x86_64.whl (4.2 MB view hashes)

Uploaded CPython 3.8

pgsync-1.1.9-cp37-cp37m-manylinux2010_x86_64.whl (3.2 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

pgsync-1.1.9-cp37-cp37m-manylinux2010_x86_64.manylinux1_x86_64.whl (3.2 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

pgsync-1.1.9-cp37-cp37m-manylinux1_x86_64.whl (3.2 MB view hashes)

Uploaded CPython 3.7m

pgsync-1.1.9-cp37-cp37m-macosx_10_15_x86_64.whl (799.4 kB view hashes)

Uploaded CPython 3.7m macOS 10.15+ x86-64

pgsync-1.1.9-cp36-cp36m-manylinux2010_x86_64.whl (3.3 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

pgsync-1.1.9-cp36-cp36m-manylinux1_x86_64.whl (3.3 MB view hashes)

Uploaded CPython 3.6m

pgsync-1.1.9-cp36-cp36m-manylinux1_x86_64.manylinux2010_x86_64.whl (3.3 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page