Skip to main content

Postgres to elasticsearch sync

Project description

PostgreSQL to Elasticsearch sync

PGSync <https://pgsync.com>_ is a middleware for syncing data from Postgres <https://www.postgresql.org>_ to Elasticsearch <https://www.elastic.co/products/elastic-stack>.
It allows you to keep Postgres <https://www.postgresql.org>
as your source of truth data source and expose structured denormalized documents in Elasticsearch <https://www.elastic.co/products/elastic-stack>_.

Requirements

  • Python <https://www.python.org>_ 3.6+
  • Postgres <https://www.postgresql.org>_ 9.4+
  • Redis <https://redis.io>_
  • Elasticsearch <https://www.elastic.co/products/elastic-stack>_ 6.3.1+

Postgres setup

Enable logical decoding <https://www.postgresql.org/docs/current/logicaldecoding.html>_ in your Postgres setting.

  • You would also need to set up two parameters in your Postgres config postgresql.conf

    wal_level = logical

    max_replication_slots = 1

Installation

You can install PGSync from PyPI <https://pypi.org>_:

$ pip install pgsync

Config

Create a schema for the application named e.g schema.json

Example schema <https://github.com/toluaina/pgsync/blob/master/examples/airbnb/schema.json>_

Example spec

.. code-block::

[
    {
        "database": "[database name]",
        "index": "[elasticsearch index]",
        "nodes": [
            {
                "table": "[table A]",
                "schema": "[table A schema]",
                "columns": [
                    "column 1 from table A",
                    "column 2 from table A",
                    ... additional columns
                ],
                "children": [
                    {
                        "table": "[table B with relationship to table A]",
                        "schema": "[table B schema]",
                        "columns": [
                          "column 1 from table B",
                          "column 2 from table B",
                          ... additional columns
                        ],
                        "relationship": {
                            "variant": "object",
                            "type": "one_to_many"
                        },
                        ...
                    },
                    {
                        ... any other additional children
                    }
                ]
            }
        ]
    }
]

Environment variables

Setup required environment variables for the application

SCHEMA='/path/to/schema.json'

ELASTICSEARCH_HOST=localhost
ELASTICSEARCH_PORT=9200

PG_HOST=localhost
PG_USER=i-am-root # this must be a postgres superuser
PG_PORT=5432
PG_PASSWORD=*****

REDIS_HOST=redis
REDIS_PORT=6379
REDIS_DB=0
REDIS_AUTH=*****

Running

Bootstrap the database (one time only) $ bootstrap --config schema.json Run pgsync as a daemon $ pgsync --config schema.json --daemon

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgsync-1.1.32.tar.gz (78.7 kB view details)

Uploaded Source

Built Distribution

pgsync-1.1.32-py2.py3-none-any.whl (40.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pgsync-1.1.32.tar.gz.

File metadata

  • Download URL: pgsync-1.1.32.tar.gz
  • Upload date:
  • Size: 78.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for pgsync-1.1.32.tar.gz
Algorithm Hash digest
SHA256 00c7ac57bf4b59817d49b23b52741bdb5c2ef335959d58f2d04d69cfdd8e31e4
MD5 baf608532dd8c3c5ab3ff1939a742212
BLAKE2b-256 317b639ed7d03b7cbd4abdb9feb57d1c8f3f31ef190eef004d2d6cb3a1036c06

See more details on using hashes here.

File details

Details for the file pgsync-1.1.32-py2.py3-none-any.whl.

File metadata

  • Download URL: pgsync-1.1.32-py2.py3-none-any.whl
  • Upload date:
  • Size: 40.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.4

File hashes

Hashes for pgsync-1.1.32-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e33d661dc3332884f3459e1bd3fcb5d2ab42409ef5c4a18778bef67450e7f96c
MD5 753de1b7e45f2e07e9c6ed84796ab189
BLAKE2b-256 55e78cea69a164edf1e2a3db2f0cae3ebbf9986dbf5484402a0e7576c4654762

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page