druzhba

A friendly data pipeline framework

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

https://travis-ci.com/seatgeek/druzhba.svg?branch=master

https://img.shields.io/pypi/v/druzhba.svg?style=flat

https://img.shields.io/pypi/l/druzhba.svg?style=flat

https://bestpractices.coreinfrastructure.org/projects/4012/badge

Druzhba is a friendly framework for building data pipelines. It efficiently copies data from your production / transactional databases to your data warehouse.

A Druzhba pipeline connects one or more source databases to a target database. It pulls data incrementally from each configured source table and writes to a target table (which is automatically created in most cases), tracking incremental state and history in the target database. Druzhba may also be configured to pull using custom SQL, which supports Jinja templating of pipeline metadata.

In a typical deployment, Druzhba serves the extract and load steps of an ELT pipeline, although it is capable of limited in-flight transformations through custom extract SQL.

Druzhba currently fully supports PostgreSQL and Mysql 5.5-5.7, and provides partial support for Microsoft SQL Server as source databases. Druzhba supports AWS Redshift as a target.

Feature requests, bug reports, and general feedback should be submitted to the issue tracker. Potential security vulnerabilities should be posted to the issue tracker as well. If a security issue report must contain sensitive information please email the maintainers and, if possible, open a public issue indicating that you have done so.

Please see the full documentation at druzhba.readthedocs.io.

Minimal Example

We’ll set up a pipeline to extract a single table from an example PostgreSQL instance that we’ll call “testsource” and write to an existing Redshift database that we’ll call “testdest”.

See quick start for a more complete example.

Install locally in a Python3 environment:

pip install druzhba

Druzhba’s behavior is defined by a set of YAML configuration files + environment variables for database connections. As minimal example, create a directory /pipeline and a file pipeline/_pipeline.yaml that configures the pipeline:

---
connection:
  host: ${REDSHIFT_HOST}
  port: 5439
  database: ${REDSHIFT_DATABASE}
  user: ${REDSHIFT_USER}
  password: ${REDSHIFT_PASSWORD}
index:
  schema: druzhba_raw
  table: pipeline_index
s3:
  bucket: ${S3_BUCKET}
  prefix: ${S3_PREFIX}
iam_copy_role: ${IAM_COPY_ROLE}
sources:
  - alias: testsource
    type: postgres

The _pipeline.yaml file defines the connection to the destination database (via environment variables), the location of Druzhba’s internal tracking table, working S3 location for temporary files, the IAM copy role, and a single source database called “testsource”.

Create a file pipeline/testsource.yaml representing the source database:

---
connection_string: postgresql://user:password@host:5432/testdest
tables:
  - source_table_name: your_table
    destination_table_name: your_table
    destination_schema_name: druzhba_raw
    index_column: updated_at
    primary_key:
      - id

The testsource.yaml file defines the connection to the testsource database (note: see documentation for more secure ways of supplying connection credentials) and a single table to copy over. The contents of your_table in the source database will be copied to your_table in the druzhba_raw schema of the target database. New rows will be identified by the value of their id column and existing rows will be replaced if their updated_at column is greater than on the previous iteration.

Then, you’ll need to set some environment variables corresponding to the template fields in the configuration file above.

Once your configuration and environment are ready, load into Redshift:

druzhba --database testsource --table your_table

Typically Druzhba’s CLI would be run on a Cron schedule. Many deployments place the configuration files in source control and use some form of CI for deployment.

Druzhba may also be imported and used as a Python library, for example to wrap pipeline execution with your own error handling.

Documentation

Please see documentation for more complete configuration examples and descriptions of the various options to configure your data pipeline.

Contributing

Druzhba is an ongoing project. Feel free to open feature request issues or PRs.

PRs should be unit-tested, and will require an integration test passes to merge.

See the docs for instructions on setting up a Docker-Compose-based test environment.

License

This project is licensed under the terms of the MIT license.

Acknowledgements

Many on the SeatGeek team had a hand in building Druzhba but we would especially like to acknowledge

Andy Enkeboll for initial conception and software architecture
Sam Kritchevsky for hardening the application into something we can share
Susan Lee for branding and design

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.2.1

Aug 13, 2020

0.2.0

Sep 3, 2020

0.2.0rc3 pre-release

Jun 26, 2020

0.2.0rc2 pre-release

Jun 25, 2020

0.2.0rc1 pre-release

Jun 9, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

druzhba-0.2.1.tar.gz (428.0 kB view details)

Uploaded Aug 13, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

druzhba-0.2.1-py3-none-any.whl (36.3 kB view details)

Uploaded Aug 13, 2020 Python 3

File details

Details for the file druzhba-0.2.1.tar.gz.

File metadata

Download URL: druzhba-0.2.1.tar.gz
Upload date: Aug 13, 2020
Size: 428.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.0

File hashes

Hashes for druzhba-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`cb15111800e5fc2639ba9ab7f809ae1d7c21803e4deca8c30d91daa26d24340d`
MD5	`3ca4089fdc72e547e4d7b6ed880a5583`
BLAKE2b-256	`f076876cef1c2671dc6aab8beb459a94294caa3a3aa81f0fe6d171bfbd0359ef`

See more details on using hashes here.

File details

Details for the file druzhba-0.2.1-py3-none-any.whl.

File metadata

Download URL: druzhba-0.2.1-py3-none-any.whl
Upload date: Aug 13, 2020
Size: 36.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.4.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.0

File hashes

Hashes for druzhba-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f3d19bcdc21cf763b3c25954d24d9826efbd5befeb8b5c47fe0017c33c3065db`
MD5	`78819002ae7530beb95523cfe42d2a2c`
BLAKE2b-256	`ed2eaed7e4ee4eb575118fef22c3516cff25fb4f0687649c4ee10c1ae2a41b9c`

See more details on using hashes here.

druzhba 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Minimal Example

Documentation

Contributing

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes