dtex (data extraction tool) — an open-source Python EL tool: pipelines as configs, connectors as folders, CLI-first.
Project description
dtex
dtex ("data extraction tool") is an open-source, self-hosted Python extract-load (EL) tool. It moves data from a source (an API, a database, a file drop) into a destination (a warehouse, a database, an object store) — and nothing more. Transformation is dbt's job.
The pitch in one line: a CLI-first, dbt-shaped extract-load tool — pipelines are configs, connectors are folders, no UI blackbox. The #1 principle is to keep it as simple as possible.
Install
pip install dtex # every baked connector, ready
pip install 'dtex[gcs,s3]' # add gs:// / s3:// filesystem reads
pip install 'dtex[gcp-secrets]' # add the GCP Secret Manager resolver
pip install 'dtex[aws-secrets]' # add the AWS Secrets Manager resolver
pip install 'dtex[vault]' # add the HashiCorp Vault resolver
pip install dtex ships every baked source and destination — DuckDB,
BigQuery, the filesystem source's local + Parquet path, the REST / Postgres
/ ShipHero / Stripe sources, the engine, the CLI. Extras stay opt-in for the
cloud-storage paths of the filesystem source (gs:// / s3://) and for
secret managers (only relevant if your profiles.yml uses secret:// URLs).
dtex requires Python 3.11+. It installs both a CLI (dtex) and an importable
library (import dtex).
Usage
dtex init my_project # scaffold a project
cd my_project
dtex new source my_api # scaffold a source connector
dtex new config my_pipeline # scaffold a pipeline config
dtex validate # check everything
dtex run -p my_pipeline # run the pipeline
dtex runs list -p my_pipeline # show recent run history
A pipeline is one config file binding a source + a destination + a target +
params. Run it with dtex run -p <config>. The library equivalent is
dtex.run(config="my_pipeline") and returns a structured RunResult.
Pre-baked connectors
Sources: filesystem (CSV/JSONL/Parquet from local, GCS, or S3),
rest (paginated REST APIs — 4 pagination strategies, 4 auth modes),
postgres (keyset pagination, no OFFSET), shiphero (GraphQL),
stripe (resource-as-stream over the REST API).
Destinations: duckdb (zero-config dev default, all 5 capabilities) and
bigquery (production warehouse — Parquet-staged via GCS + LOAD jobs,
MERGE upserts, cursor-based partitioning).
Engine: per-stream commit + atomic transactions (rollback on failure),
state in the destination's _dtex_state table, run records in _dtex_runs,
structured JSON-lines logs per run, secret redaction, schema evolution
(evolve default, strict opt-in), pipeline-level parallelism with
per-destination caps.
Secret managers: GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault — each as an opt-in extra.
Documentation
The full design handbook lives in docs/.
Start with
00 — Vision & Naming,
02 — Architecture,
06 — Project Anatomy,
12 — Configs, and
10 — Roadmap and Scope.
Security · Contributing · Code of Conduct
- Security policy — how to report a vulnerability.
- Contributing — dev setup, PR process, how to add a connector.
- Code of Conduct.
- Changelog.
License
Apache License 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dtex-0.1.3.tar.gz.
File metadata
- Download URL: dtex-0.1.3.tar.gz
- Upload date:
- Size: 447.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29dfd7c388a8936f0b1f4dfe25a2a8b48497c1e018789f48ed5f5ff5643c96e4
|
|
| MD5 |
9859c4f964f147bed5ddd0ff41daf383
|
|
| BLAKE2b-256 |
1c514f4848f70677f8e7ac5bd8077e222e25a2373ef3b34801175744d171a634
|
Provenance
The following attestation bundles were made for dtex-0.1.3.tar.gz:
Publisher:
publish.yml on vej-ai/dtex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dtex-0.1.3.tar.gz -
Subject digest:
29dfd7c388a8936f0b1f4dfe25a2a8b48497c1e018789f48ed5f5ff5643c96e4 - Sigstore transparency entry: 1659978045
- Sigstore integration time:
-
Permalink:
vej-ai/dtex@0add0c5a466f839a06d6dc13ed8c0f641de689cd -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/vej-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0add0c5a466f839a06d6dc13ed8c0f641de689cd -
Trigger Event:
push
-
Statement type:
File details
Details for the file dtex-0.1.3-py3-none-any.whl.
File metadata
- Download URL: dtex-0.1.3-py3-none-any.whl
- Upload date:
- Size: 284.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e92b8ad2f51467381db9a529d80a809664013ebb02c9d7099c76e5beef63e868
|
|
| MD5 |
a3f9f6f7a736da83b3445ed3fb3bb7ca
|
|
| BLAKE2b-256 |
0a8c8bd2c5cc26debe47491ecadafdb38452ee73532ee643c2890a90642d9895
|
Provenance
The following attestation bundles were made for dtex-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on vej-ai/dtex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dtex-0.1.3-py3-none-any.whl -
Subject digest:
e92b8ad2f51467381db9a529d80a809664013ebb02c9d7099c76e5beef63e868 - Sigstore transparency entry: 1659978099
- Sigstore integration time:
-
Permalink:
vej-ai/dtex@0add0c5a466f839a06d6dc13ed8c0f641de689cd -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/vej-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0add0c5a466f839a06d6dc13ed8c0f641de689cd -
Trigger Event:
push
-
Statement type: