Skip to main content

A Change Data Capture (CDC) library for data synchronization

Project description

EvolvisHub Data Handler

Evolvis AI Logo

PyPI version Python Versions License: MIT CI/CD Code Coverage

A robust Change Data Capture (CDC) library for efficient data synchronization across various databases and storage systems.

Features

  • Multi-Database Support: Seamlessly sync data between PostgreSQL, MySQL, SQL Server, Oracle, MongoDB, and more
  • Cloud Storage Integration: Native support for AWS S3, Google Cloud Storage, and Azure Blob Storage
  • File System Support: Handle CSV, JSON, and other file formats
  • Watermark Tracking: Efficient incremental sync with configurable watermark columns
  • Batch Processing: Optimize performance with configurable batch sizes
  • Error Handling: Robust error recovery and logging
  • Type Safety: Full type hints and validation with Pydantic
  • Extensible: Easy to add new adapters and data sources

Installation

# Install from PyPI
pip install evolvishub-data-handler

# Install with development dependencies
pip install evolvishub-data-handler[dev]

# Install with documentation dependencies
pip install evolvishub-data-handler[docs]

Quick Start

  1. Create a configuration file (e.g., config.yaml):
source:
  type: postgresql
  host: localhost
  port: 5432
  database: source_db
  user: source_user
  password: source_password
  watermark:
    column: updated_at
    type: timestamp
    initial_value: "1970-01-01 00:00:00"

destination:
  type: postgresql
  host: localhost
  port: 5432
  database: dest_db
  user: dest_user
  password: dest_password
  watermark:
    column: updated_at
    type: timestamp
    initial_value: "1970-01-01 00:00:00"

sync:
  batch_size: 1000
  interval_seconds: 60
  watermark_table: sync_watermark
  1. Use the library in your code:
from evolvishub_data_handler import CDCHandler

# Initialize the handler
handler = CDCHandler("config.yaml")

# Run one-time sync
handler.sync()

# Or run continuous sync
handler.run_continuous()
  1. Or use the command-line interface:
# One-time sync
evolvishub-cdc -c config.yaml

# Continuous sync
evolvishub-cdc -c config.yaml -m continuous

# With custom logging
evolvishub-cdc -c config.yaml -l DEBUG --log-file sync.log

Supported Data Sources

Databases

  • PostgreSQL
  • MySQL
  • SQL Server
  • Oracle
  • MongoDB

Cloud Storage

  • AWS S3
  • Google Cloud Storage
  • Azure Blob Storage

File Systems

  • CSV files
  • JSON files
  • Parquet files

Development

Setup

  1. Clone the repository:
git clone https://github.com/evolvishub/evolvishub-data-handler.git
cd evolvishub-data-handler
  1. Create a virtual environment:
make venv
  1. Install development dependencies:
make install
  1. Install pre-commit hooks:
make install-hooks

Testing

Run the test suite:

make test

Code Quality

Format code:

make format

Run linters:

make lint

Building

Build the package:

make build

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

EvolvisHub Data Handler Adapter

A powerful and flexible data handling adapter for Evolvis AI's data processing pipeline. This tool provides seamless integration with various database systems and implements Change Data Capture (CDC) functionality.

About Evolvis AI

Evolvis AI is a leading provider of AI solutions that helps businesses unlock their data potential. We specialize in:

  • Data analysis and decision-making
  • Machine learning implementation
  • Process optimization
  • Predictive maintenance
  • Natural language processing
  • Custom AI solutions

Our mission is to make artificial intelligence accessible to businesses of all sizes, enabling them to compete in today's data-driven environment. As Forbes highlights: "Organizations that strategically adopt AI will have a significant competitive advantage in today's data-driven market."

Author

Alban Maxhuni, PhD
Email: a.maxhuni@evolvis.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evolvishub_data_handler-0.1.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evolvishub_data_handler-0.1.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file evolvishub_data_handler-0.1.0.tar.gz.

File metadata

  • Download URL: evolvishub_data_handler-0.1.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for evolvishub_data_handler-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f74e7c2958306598006c9aff8b6b467f6e24fc198a7cc974b9645c3d9f93ea9e
MD5 0424684ff65b207a12eb13172ed6cf45
BLAKE2b-256 4076407035022d5b143bbeeab790e856f34aee7091e7acac108ab37c9cb63d9c

See more details on using hashes here.

File details

Details for the file evolvishub_data_handler-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for evolvishub_data_handler-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc87de9bd1ac6ab212fa29dda7ff15875895f0a229a4801e6d4c1018201c19a1
MD5 710a7de152d7ebf082a93ed72e1e3a70
BLAKE2b-256 b89241788b24912bb9efb2fbc0e1d821d091d754304806d9b52c1c4ec2d23744

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page