Skip to main content

DLT is an open-source python-native scalable data loading framework that does not require any devops efforts to run.

Project description

PyPI version LINT Badge TEST COMMON Badge TEST DESTINATIONS Badge TEST BIGQUERY Badge TEST DBT Badge

data load tool (dlt)

Colab Demo

import dlt
from chess import chess # a utility function that grabs data from the chess.com API

# create a dlt pipeline that will load chess game data to the DuckDB destination
pipeline = dlt.pipeline(
    pipeline_name='chess_pipeline',
    destination='duckdb',
    dataset_name='games_data'
)

# use chess.com API to grab data about a few players
data = chess(['magnuscarlsen', 'rpragchess'], start_month='2022/11', end_month='2022/12')

# extract, normalize, and load the data
pipeline.run(data)

data load tool (dlt) is a simple, open source Python library that makes data loading easy

  • Automatically turn the JSON returned by any API into a live dataset stored wherever you want it
  • pip install python-dlt and then include import dlt to use it in your Python loading script
  • The dlt library is licensed under the Apache License 2.0, so you can use it for free forever

Read more about it on the dlt Docs

semantic versioning

python-dlt will follow the semantic versioning with MAJOR.MINOR.PATCH pattern. Currently we do pre-release versioning with major version being 0.

  • minor version change means breaking changes
  • patch version change means new features that should be backward compatible
  • any suffix change ie. a10 -> a11 is a patch

development

python-dlt uses poetry to manage, build and version the package. It also uses make to automate tasks. To start

make install-poetry  # will install poetry, to be run outside virtualenv

then

make dev  # will install all deps including dev

Executing poetry shell and working in it is very convenient at this moment.

python version

Use python 3.8 for development which is the lowest supported version for python-dlt. You'll need distutils and venv:

sudo apt-get install python3.8
sudo apt-get install python3.8-distutils
sudo apt install python3.8-venv

You may also use pyenv as poetry suggests.

bumping version

Please use poetry version prerelease to bump patch and then make build-library to apply changes. The source of the version is pyproject.toml and we use poetry to manage it.

testing and linting

python-dlt uses mypy and flake8 with several plugins for linting. We do not reorder imports or reformat code.

pytest is used as test harness. make test-common will run tests of common components and does not require any external resources.

testing destinations

To test destinations use make test. You will need following external resources

  1. BigQuery project
  2. Redshift cluster
  3. Postgres instance. You can find a docker compose for postgres instance here. When run the instance is configured to work with the tests.
cd tests/load/postgres/
docker-compose up --build -d

See tests/.example.env for the expected environment variables and command line example to run the tests. Then create tests/.env from it. You configure the tests as you would configure the dlt pipeline. We'll provide you with access to the resources above if you wish to test locally.

To test local destinations (duckdb and postgres) run make test-local. You can run this tests without additional credentials (just copy .example.env into .env)

publishing

  1. Make sure that you are on devel branch and you have the newest code that passed all tests on CI.
  2. Verify the current version with poetry version
  3. You'll need pypi access token and use poetry config pypi-token.pypi your-api-token then
make publish-library
  1. Make a release on github, use version and git tag as release name

contributing

To contribute via pull request:

  1. Create an issue with your idea for a feature etc.
  2. Write your code and tests
  3. Lint your code with make lint. Test the common modules with make test-common
  4. If you work on a destination code then contact us to get access to test destinations
  5. Create a pull request

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_dlt-0.2.0a32.tar.gz (242.1 kB view details)

Uploaded Source

Built Distribution

python_dlt-0.2.0a32-py3-none-any.whl (318.6 kB view details)

Uploaded Python 3

File details

Details for the file python_dlt-0.2.0a32.tar.gz.

File metadata

  • Download URL: python_dlt-0.2.0a32.tar.gz
  • Upload date:
  • Size: 242.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.11 Linux/4.19.128-microsoft-standard

File hashes

Hashes for python_dlt-0.2.0a32.tar.gz
Algorithm Hash digest
SHA256 9a85234c99a9cd77cf1774e07dae2d3334b0295d82790f4c592f4d625032af76
MD5 3b9eee3a5371b32438348420bda32f45
BLAKE2b-256 ffa13d89a088c9a6f9c05dd57101fe80db36183fccf2b9f80e78126cdfc7bcdb

See more details on using hashes here.

File details

Details for the file python_dlt-0.2.0a32-py3-none-any.whl.

File metadata

  • Download URL: python_dlt-0.2.0a32-py3-none-any.whl
  • Upload date:
  • Size: 318.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.11 Linux/4.19.128-microsoft-standard

File hashes

Hashes for python_dlt-0.2.0a32-py3-none-any.whl
Algorithm Hash digest
SHA256 1a91d842ed1352e25a67f54d947721af1ecd5688c7dbeea3b2e43eeb46c7a083
MD5 089254a72068c05488d8f1f6d4c5cc40
BLAKE2b-256 a7c7f967752c8aacb6cb082829eac053656efefcceb17babd12525c2334677e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page