Skip to main content

Enhanced PostgreSQL to Elasticsearch Data Synchronization

Project description

⚡ pg2elastic

Enhanced PostgreSQL to Elasticsearch Data Synchronization

📚 Description

Welcome to pg2elastic, a fork of the official pgsync package, designed to provide seamless and efficient data synchronization between PostgreSQL databases and Elasticsearch clusters. Building upon the solid foundation of pgsync, pg2elastic inherits all of its powerful capabilities and takes them a step further.

Key Features:

  • High-Performance Sync: pg2elastic inherits the robust data synchronization engine from pgsync, ensuring lightning-fast and reliable transfers.
  • Real-time Indexing: Seamlessly mirror your PostgreSQL data into Elasticsearch indices, keeping them in sync in real-time.
  • Schema Mapping: Easily define and customize the mapping of PostgreSQL schemas to Elasticsearch indexes, giving you full control over the data structure.
  • Efficient Data Types Handling: pg2elastic effortlessly handles data type conversions, ensuring accurate representation across platforms.
  • Continuous Enhancements: We are committed to actively maintaining and enhancing pg2elastic, incorporating the latest advancements in both PostgreSQL and Elasticsearch technologies.
  • Whether you're working on a data-driven application or performing complex data analysis, pg2elastic empowers you with a streamlined and feature-rich solution for harmonizing your PostgreSQL and Elasticsearch ecosystems.

🛠️ Prerequisites

PGSync Requirements


✨ Key Enhancements


Fixes

The actual numerical value of 1000000 becomes 1e when attempted to be converted to a float, leading to a crash during the conversion process.

  • A fix was implemented to change 1e to 1e6, preventing the conversion to float from causing the process to crash.

The process crashes when custom field types are use in the database, and the returned field type is in the format abc.xyz.

  • A fix was implemented, and the regular expression for LOGICAL_SLOT_SUFFIX was modified.

The process crashes when a partition notification is received

  • A fix was implemented, and partitions are tracked in lib's materialized, lib's trigger is updated to include partition's parent table

Child records that are inserted are not updating the parent document

  • A fix was implemented, which synchronizes main document in case a child record is created

Environment Variables

PG_SCHEMA

  • Environment variable to enhance performance by eliminating the need to scan all schemas

REDIS_USERNAME

  • Environment variable to specify redis username

REDIS_PASSWORD

  • Environment variable to specify redis password

REDIS_ENDPOINT

  • Environment variable to specify redis connection endpoint, defaults to localhost

REDIS_SSL

  • Environment variable to specify if redis connection should use ssl, defaults to true

REDIS_CLUSTER

  • Environment variable to specify if redis connection is clustered, defaults to true

REDIS_CHECKPOINT

  • Environment variable to specify if redis will be used to save restore checkpoints, defaults to true

SKIP_BOOTSTRAP

  • Environment variable to specify if boostrap command should be skipped, defaults to true.
  • Use this env variable if bootstrap command was already run, and you have your bootstrap command stuck in a shell script.
  • Set to false in cause there are new indexes or schema changes

🚀 Deployment

Manual Deployment

How to run pg2elastic and initialize it.

  • Create a .env file using the cp .env.sample .env command and replace the existing environment variables with personal configuration settings.

  • Download dependencies using python setup.py develop

  • Start the app by using pg2elastic file command from bin folder, using python3 pg2elastic --schema yourschema.json

If you do not run the full setup, you will get errors when running this package.


✅ Testing

$ export PG_SCHEMA=
$ flake8 pg2elastic tests
$ python setup.py test

🔊 Logs

This project comes with a loguru module for logging, the configurations for loguru can be found in pg2elastic file from bin folder.


🚚 Deployment

$ python setup.py sdist bdist_wheel
$ twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg2elastic-0.3.6.tar.gz (113.2 kB view details)

Uploaded Source

Built Distribution

pg2elastic-0.3.6-py3-none-any.whl (67.0 kB view details)

Uploaded Python 3

File details

Details for the file pg2elastic-0.3.6.tar.gz.

File metadata

  • Download URL: pg2elastic-0.3.6.tar.gz
  • Upload date:
  • Size: 113.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pg2elastic-0.3.6.tar.gz
Algorithm Hash digest
SHA256 3a61347e1675503e4cf1cee95e9b197fa2f4323f39cb55a11c474f21f2884952
MD5 ca2cb68bcf9b672206395aa5e57df956
BLAKE2b-256 8fa7c186370bfb8886786624489d5b104d02d21be1da65640fe05427fe94628b

See more details on using hashes here.

File details

Details for the file pg2elastic-0.3.6-py3-none-any.whl.

File metadata

  • Download URL: pg2elastic-0.3.6-py3-none-any.whl
  • Upload date:
  • Size: 67.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pg2elastic-0.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f738e0d4b43cedbb3ffb07900f6b7e7d24d22968dd8f82af513a9af7558eb501
MD5 f8cce681c36c0c93fea93cee9867f859
BLAKE2b-256 03fc150efe2625439041407222c222d92ea477cc99132d0626f7e9e7a34bc3fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page