DLT is an open-source python-native scalable data loading framework that does not require any devops efforts to run.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

DLT

DLT enables simple python-native data pipelining for data professionals.

DLT is an open-source python-native scalable data loading framework that does not require any devops efforts to run.

Quickstart guide

How does it work?

DLT aims to simplify data loading for everyone.

To achieve this, we take into account the progressive steps of data pipelining:

1. Data discovery, typing, schema, metadata

When we create a pipeline, we start by grabbing data from the source.

Usually, the source metadata is lacking, so we need to look at the actual data to understand what it is and how to ingest it.

In order to facilitate this, DLT includes several features

Auto-unpack nested json if desired
generate an inferred schema with data types and load data as-is for inspection in your warehouse.
Use an ajusted schema for follow up loads, to better type and filter your data after visual inspection (this also solves dynamic typing of Pandas dfs)

2. Safe, scalable loading

When we load data, many things can intrerupt the process, so we want to make sure we can safely retry without generating artefacts in the data.

Additionally, it's not uncommon to not know the data size in advance, making it a challenge to match data size to loading infrastructure.

With good pipelining design, safe loading becomes a non-issue.

Idempotency: The data pipeline supports idempotency on load, so no risk of data duplication.
Atomicity: The data is either loaded, or not. Partial loading occurs in the s3/storage buffer, which is then fully committed to warehouse/catalogue once finished. If something fails, the buffer is not partially-commited further.
Data-size agnostic: By using generators (like incremental downloading) and online storage as a buffer, it can incrementally process sources of any size without running into worker-machine size limitations.

3. Modelling and analysis

Instantiate a dbt package with the source schema, enabling you to skip the dbt setup part and go right to SQL modelling.

4. Data contracts

If using an explicit schema, you are able to validate the incoming data against it. Particularly useful when ingesting untyped data such as pandas dataframes, json from apis, documents from nosql etc.

5. Maintenance & Updates

Auto schema migration: What do you do when a new field appears, or if it changes type? With auto schema migration you can default to ingest this data, or throw a validation error.

Why?

Data loading is at the base of the data work pyramid.

The current ecosystem of tools follows an old paradigm where the data pipeline creator is a software engineer, while the data pipeline user is an analyst.

In the current world, the data analyst needs to solve problems end to end, including loading.

Currently there are no simple frameworks to achieve this, but only clunky applications that need engineering and devops expertise to run, install, manage and scale. The reason for this is often an artificial monetisation insert (open source but pay to manage).

Additionally, these existing loaders only load data sources for which somebody developed an extractor, requiring a software developer once again.

DLT aims to bring loading into the hands of analysts with none of the unreasonable redundacy waste of the modern data platform.

Additionally, the source schemas will be compatible across the community, creating the possiblity to share reusable analysis and modelling back to the open source community without creating tool-based vendor locks.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.1

Apr 18, 2023

0.2.0a32 pre-release

Apr 16, 2023

0.2.0a31 pre-release

Apr 12, 2023

0.2.0a30 pre-release

Apr 9, 2023

0.2.0a29 pre-release

Mar 30, 2023

0.2.0a28 pre-release

Mar 23, 2023

0.2.0a27 pre-release

Mar 19, 2023

0.2.0a26 pre-release

Mar 14, 2023

0.2.0a25 pre-release

Mar 8, 2023

0.2.0a24 pre-release

Mar 7, 2023

0.2.0a23 pre-release

Mar 1, 2023

0.2.0a22 pre-release

Feb 28, 2023

0.2.0a21 pre-release

Feb 21, 2023

0.2.0a20 pre-release

Feb 20, 2023

0.2.0a19 pre-release

Feb 17, 2023

0.2.0a18 pre-release

Feb 15, 2023

0.2.0a17 pre-release

Feb 9, 2023

0.2.0a16 pre-release

Jan 31, 2023

0.2.0a15 pre-release

Jan 8, 2023

0.2.0a14 pre-release

Dec 12, 2022

0.2.0a13 pre-release

Dec 11, 2022

0.2.0a12 pre-release

Dec 10, 2022

0.2.0a11 pre-release

Dec 9, 2022

0.2.0a10 pre-release

Dec 7, 2022

0.2.0a9 pre-release

Dec 5, 2022

0.2.0a8 pre-release

Dec 4, 2022

0.2.0a7 pre-release

Dec 2, 2022

0.2.0a6 pre-release

Nov 30, 2022

0.2.0a5 pre-release

Nov 29, 2022

0.2.0a4 pre-release

Nov 29, 2022

0.2.0a3 pre-release

Nov 29, 2022

0.2.0a2 pre-release

Nov 24, 2022

0.2.0a1 pre-release

Nov 23, 2022

0.1.0rc15 pre-release

Nov 2, 2022

0.1.0rc14 pre-release

Sep 20, 2022

0.1.0rc13 pre-release

Aug 26, 2022

0.1.0rc12 pre-release

Aug 24, 2022

0.1.0rc11 pre-release

Aug 14, 2022

0.1.0rc10 pre-release

Aug 9, 2022

0.1.0rc9 pre-release

Jul 14, 2022

0.1.0rc8 pre-release

Jul 11, 2022

0.1.0rc7 pre-release

Jun 30, 2022

0.1.0rc6 pre-release

Jun 29, 2022

0.1.0rc5 pre-release

Jun 28, 2022

0.1.0rc4 pre-release

Jun 27, 2022

0.1.0rc3 pre-release

Jun 13, 2022

0.1.0rc2 pre-release

Jun 9, 2022

0.1.0rc1 pre-release

Jun 9, 2022

This version

0.1.0a0 pre-release

Jun 6, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-dlt-0.1.0a0.tar.gz (63.7 kB view hashes)

Uploaded Jun 6, 2022 Source

Built Distribution

python_dlt-0.1.0a0-py3-none-any.whl (86.3 kB view hashes)

Uploaded Jun 6, 2022 Python 3

Hashes for python-dlt-0.1.0a0.tar.gz

Hashes for python-dlt-0.1.0a0.tar.gz
Algorithm	Hash digest
SHA256	`456ae2d09d4126241439e90a94693f7e097b8985e0dcde0d2a5703e7f4aff5fb`
MD5	`342b93e89a2f96ecc3f63fe6a63b9d3b`
BLAKE2b-256	`0e9f7f1cb577f150c629bf942baf8ce46d2e70fee319fcb6fca4b11b7b5ad511`

Hashes for python_dlt-0.1.0a0-py3-none-any.whl

Hashes for python_dlt-0.1.0a0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5733b63519c306c94b4aaa2fd5c7f44b0290e3ff1826d768fb714d309c2e9a5e`
MD5	`07f8b4b203b45a1ca0db4382f9f4c248`
BLAKE2b-256	`6e57896122bd06eeb21b59a145ed9334339996c1ddfa4cb666fb3b13098cecb4`