Skip to main content

Change Data Capture (CDC) library for Postgres

Project description

pypgcdc

Change Data Capture (CDC) tool for Postgres

This project is a Python implementation of a Postgres CDC client. It is intended to be used as a library for building CDC applications. It is not intended to be used as a standalone application but there is a running example useful as a starting point.

The main problem with CDC and Postgres in practice is that Postgres runs with a hot stand-by used for failover in case of a primary node failure. The change is usually behind a DNS record so clients just re-connect and run fine. The problem is that the replication slot is not copied over to the new primary node and the CDC client will either fail to start (no slot) or create a slot but potentially miss some data (slot created after some data was written).

You can, of course, create a new slot, do the initial sync where you copy over all the available data, and then start replicating as usual. The problem is that the initial sync can take a long time if you have big tables.

This library doesn't really solve the problem, but it provides a way to react to the failover event. The idea is that you add triggers to the tables you want to replicate (published tables) and store the inserts/updates/deletes in separate tables (log tables). The initial sync when you start your CDC app can just select * from the published tables and then tail the changes on these tables. You will need some persistence to store the commits you have already processed so when a failover event occurs, you can use the last processed commit to select any changes from the log tables and then carry on tailing the published tables. This way you can avoid the initial sync and catch up with any pending changes faster.

Env Vars

  • PYPGCDC_DSN, default "postgres://postgres:postgrespw@localhost:5432/test" -- Postgres connection string
  • PYPGCDC_SLOT, default "test_slot" -- Postgres replication slot name
  • PYPGCDC_PUBLICATION, default "test_publication" -- Postgres publication name
  • PYPGCDC_LSN, default 0 -- Postgres LSN to start from
  • PYPGCDC_VERBOSE, default "False" -- A flag used to control print output of the example datastore. Use one of ("1", "true", "yes") to enable more verbose output.

Example

The library comes with an example which can be used to see how it works. The example requires a running Postgres database with some tables and an existing publication. The example will create a replication slot if it doesn't exist and start tailing the changes. The example will print the changes to stdout.

Once you finish with the example, remember to drop the replication slot. Leaving an unused replication slot is dangerous as the WAL files used for replication might not be removed, and you can run out of disk space (not an issue on your local computer but quite a problem on your production servers...).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-postgres-cdc-0.0.0rc2.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

python_postgres_cdc-0.0.0rc2-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file python-postgres-cdc-0.0.0rc2.tar.gz.

File metadata

File hashes

Hashes for python-postgres-cdc-0.0.0rc2.tar.gz
Algorithm Hash digest
SHA256 64de0862c4a253c6acf5fecfb4089aecc833c5399ce71cf5ac3d4c582482d129
MD5 48f9c3b62784f70799306c0fcf95c155
BLAKE2b-256 ac900c79e44c7a2707964ea74e81330d471238844728628b018534bf882fed42

See more details on using hashes here.

File details

Details for the file python_postgres_cdc-0.0.0rc2-py3-none-any.whl.

File metadata

File hashes

Hashes for python_postgres_cdc-0.0.0rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 6bba92112459e7f53abc2ee303ac7c6d6cdb3bf88bf1d4d23d8c177425764be0
MD5 9f70a831513c6a832183b4547aa39b2a
BLAKE2b-256 54e5d688d5da2f80d35939fe8ba251d7d217c37669cca9b0e5f69d7a25cfc10d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page