Skip to main content

A powerful database replication and synchronization tool

Project description

Database Sync Service

A high-performance, resilient PostgreSQL data replication service designed for selective column synchronization and automated schema evolution.

Overview

This service facilitates the replication of specific tables and columns from a primary PostgreSQL database to one or more replica databases. It is built for scenarios where you need to maintain specialized read replicas or sync data across microservices while maintaining strictly controlled schemas.

Configuration

The service is configured via config/sync.yaml.

primary_db:
  url: postgresql://user:pass@localhost:5432/primary_db?sslmode=disable

replica_dbs:
  - name: replica_1
    url: postgresql://user:pass@localhost:5433/replica_db?sslmode=disable

tables:
  users:
    primary_key: id
    mode: upsert        # Options: insert | upsert
    batch_size: 10000
    
    # Columns to extract and maintain
    columns_to_sync:
      - user_name
      - email
      - metadata
      - updated_at

    # Define if primary and replica column names differ
    column_mapping:
      # primary_col: replica_col
      user_name: username

    # Columns to update on conflict (if mode is upsert)
    conflict_resolution:
      update_columns:
        - username
        - email
        - updated_at

    checksum:
      enabled: true
      columns:
        - email
        - username

  orders:
    primary_key: order_id
    mode: insert
    batch_size: 5000
    columns_to_sync:
      - customer_id
      - total_amount
      - status

Getting Started

Prerequisites

  • Python 3.10+
  • PostgreSQL instances (Primary and Replica)

Installation

You can install syncset-db using pip:

pip install syncset-db

Running the Service

You can run the service using the globally installed syncset command or directly via the script.

Using the CLI tool:

# Run with a custom configuration file (Recommended)
syncset --file=sync.yaml

# Run with custom config and dry-run mode
syncset --file=sync.yaml --dry-run

If you don't provide a file, it defaults to config/sync.yaml.

Using Python directly:

# Start Sync
python3 cli.py --file=sync.yaml

# Dry Run
python3 cli.py --file=sync.yaml --dry-run

Key Features

  • Selective Replication: Sync only the tables and columns you need.
  • Incremental Sync: Tracks synchronization state via high-watermark primary keys to ensure only new or modified data is processed.
  • Data Integrity: Optional checksum-based validation to ensure rows are truly identical before skipping them.
  • Multi-Replica Support: Synchronize the same primary data to multiple independent targets in parallel.

Architecture

The synchronization follows a batched extraction and load pattern:

  1. Validate: Perform checksum comparisons (if enabled) against existing replica data to minimize redundant writes.
  2. Load: Execute bulk upserts or inserts into the replica database.
  3. State Update: Persist the highest processed primary key to .sync_state.json.

State Management

Replication progress is stored in .sync_state.json. To re-trigger a full synchronization for a specific table, simply remove its entry from this file or delete the file entirely.

Future Plans

  • CDC Support: Implement logical decoding to enable near real-time synchronization.
  • Monitoring: Integration with Prometheus and Grafana for health and performance monitoring.
  • Web Dashboard: A lightweight management UI to monitor sync progress and adjust configuration visually.
  • Multi-Database Support: Extend beyond PostgreSQL to support MySQL, SQLite, and MongoDB as targets.
  • Compression: Add support for data compression during transit for high-latency connections.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

syncset_db-0.2.0.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

syncset_db-0.2.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file syncset_db-0.2.0.tar.gz.

File metadata

  • Download URL: syncset_db-0.2.0.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for syncset_db-0.2.0.tar.gz
Algorithm Hash digest
SHA256 69f3db0955c5dd0530a254da1e9e3fa7b931ce33bedc5641ba25022a10e64250
MD5 35d264ab3df73ebb71e80e08c5afec59
BLAKE2b-256 25028caf5b7155715ad8a70bd48f1b96d2a3a128b0dad8fa180ab0353f1f9ba1

See more details on using hashes here.

Provenance

The following attestation bundles were made for syncset_db-0.2.0.tar.gz:

Publisher: publish.yml on MohamedAklamaash/syncset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file syncset_db-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: syncset_db-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for syncset_db-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 264400387afc09b66de5bcc817ac763ddc27d3802df5acbebf024f30ada022cf
MD5 e968dabd7a124d4738847f859821dcfd
BLAKE2b-256 9bc44630e3666c7d06ea16bb917470ce2c2c1abb4280f757360fd5dfaf3f0e02

See more details on using hashes here.

Provenance

The following attestation bundles were made for syncset_db-0.2.0-py3-none-any.whl:

Publisher: publish.yml on MohamedAklamaash/syncset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page