Skip to main content

This library provides the mechanism to implement Change Data Capture (CDC) services...

Project description

core-cdc (CDC a.k.a Change Data Capture)

It provides the core mechanism and required resources to implement “Change Data Capture” services…


Python Versions License Pipeline Status Docs Status Security

Installation

Install from PyPI using pip:

pip install core-cdc
uv pip install core-cdc  # Or using UV...

Features

Multi-Database CDC Support
  • MySQL Binary Log (BinLog) based change capture

  • MongoDB Change Streams for real-time event streaming

  • Extensible processor architecture for additional database engines

Comprehensive Event Handling
  • DML operations: INSERT, UPDATE, DELETE

  • DDL operations: CREATE, ALTER, DROP (schemas and tables)

  • Configurable event filtering by operation type

Flexible Target Replication
  • Implement your own target by subclassing ITarget

  • Send records to any destination: database, queue, data warehouse, etc.

  • Support for multiple simultaneous targets

Standardized Data Format
  • Common Record structure for cross-service integration

  • Includes metadata: timestamps, transaction IDs, source position

  • JSON serialization support for streaming and messaging systems

Production-Ready Features
  • Built-in error handling and retry mechanisms

  • Comprehensive logging for monitoring and debugging

  • Optional event timestamp column for UPSERT/MERGE operations

Quick Start

Installation

Install the package:

pip install core-cdc
uv pip install core-cdc     # Or using UV...
pip install -e ".[dev]"     # For development...

Setting Up Environment

  1. Install required libraries:

pip install --upgrade pip
pip install virtualenv
  1. Create Python virtual environment:

virtualenv --python=python3.12 .venv
  1. Activate the virtual environment:

source .venv/bin/activate

Install packages

pip install .
pip install -e ".[dev]"

Optional libraries

pip install '.[all]'    # MySQL + MongoDB
pip install '.[mysql]'  # MySQL BinLog support
pip install '.[mongo]'  # MongoDB Change Streams support

Check tests and coverage

python manager.py run-tests                   # unit tests
python manager.py run-tests --test-type integration
python manager.py run-coverage

Functional Tests

Functional tests require live database servers and are not discovered by pytest or tox automatically (files are named check_*.py to prevent accidental execution).

The helper script tests/functional/quick_test.sh checks connectivity, runs both MySQL and MongoDB test suites, and prints a metrics summary:

bash tests/functional/quick_test.sh

All connection parameters default to the Docker values below and can be overridden via environment variables (MYSQL_HOST, MYSQL_PASSWORD, MONGO_HOST, MONGO_DATABASE, etc.).

To run individual test files:

python manager.py run-tests --test-type functional --pattern "*.py"

Spinning Up Local Servers with Docker

MongoDB Replica Set (required for Change Streams):

docker network create mongoCluster

docker run -d --rm -p 27017:27017 --name mongo1 --network mongoCluster \
    mongo:5 mongod --replSet myReplicaSet --bind_ip localhost,mongo1

docker run -d --rm -p 27018:27017 --name mongo2 --network mongoCluster \
    mongo:5 mongod --replSet myReplicaSet --bind_ip localhost,mongo2

docker run -d --rm -p 27019:27017 --name mongo3 --network mongoCluster \
    mongo:5 mongod --replSet myReplicaSet --bind_ip localhost,mongo3

docker exec -it mongo1 mongosh --eval "rs.initiate({
  _id: \"myReplicaSet\",
  members: [
    {_id: 0, host: \"mongo1\"},
    {_id: 1, host: \"mongo2\"},
    {_id: 2, host: \"mongo3\"}
  ]
})"

Check cluster status:

docker ps
docker exec -it mongo1 mongosh --eval "rs.status()"

MySQL (BinLog replication enabled by default in the official image):

docker run \
  --env=MYSQL_ROOT_PASSWORD=mysql_password \
  --volume=/var/lib/mysql \
  -p 3306:3306 \
  --restart=no \
  -d mysql:latest

Implemented CDC Engines

The following database engines have CDC implementations:

Fully Implemented

MySQL - Binary Log (BinLog) based CDC
  • Uses mysql-replication library

  • Captures INSERT, UPDATE, DELETE operations

  • Supports DDL events (CREATE, ALTER, DROP)

  • Fallback mechanism for column name resolution

  • See: core_cdc/processors/mysql/

MongoDB - Change Streams based CDC
  • Uses native MongoDB Change Streams

  • Captures INSERT, UPDATE, DELETE operations

  • Requires replica set configuration

  • Real-time event streaming

  • See: core_cdc/processors/mongo/

Planned / Documentation Only

MS SQL Server and Oracle implementations are not yet included. Reference guides and implementation templates are available in the documentation.

Contributing

Contributions are welcome! Please:

  1. Fork the repository

  2. Create a feature branch

  3. Write tests for new functionality

  4. Ensure all tests pass: python manager.py run-tests --test-type integration

  5. Run linting: pylint core_cdc

  6. Run security checks: bandit -r core_cdc

  7. Submit a pull request

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

For questions or support, please open an issue on GitLab or contact the maintainers.

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

core_cdc-3.1.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

core_cdc-3.1.0-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file core_cdc-3.1.0.tar.gz.

File metadata

  • Download URL: core_cdc-3.1.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for core_cdc-3.1.0.tar.gz
Algorithm Hash digest
SHA256 fb16fa8fa716f7b2ad80344a6ef441d8fe66db5556bfafc29b02ff34ce8b6e17
MD5 0cca0e09d701372dde8010719294310a
BLAKE2b-256 168fbc937511c94916ce7ab32af38eb25cda5659b67f3cb44ea0574ca731e030

See more details on using hashes here.

File details

Details for the file core_cdc-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: core_cdc-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for core_cdc-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b767755018c515de2717d5ffcba615183499c52db3ef58136f7905524fef6f7b
MD5 be0a7ff3f17165a85fdb9cead50c10ec
BLAKE2b-256 7bc9534f8ec204d7e597156c1f90e52da8684a038f633633f2b0aaa0d7b49016

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page