Skip to main content

dtex (data extraction tool) — an open-source Python EL tool: pipelines as configs, connectors as folders, CLI-first.

Project description

dtex

PyPI version Python versions License: Apache-2.0 CI

dtex ("data extraction tool") is an open-source, self-hosted Python extract-load (EL) tool. It moves data from a source (an API, a database, a file drop) into a destination (a warehouse, a database, an object store) — and nothing more. Transformation is dbt's job.

The pitch in one line: a CLI-first, dbt-shaped extract-load tool — pipelines are configs, connectors are folders, no UI blackbox. The #1 principle is to keep it as simple as possible.

Install

pip install dtex                          # every baked connector, ready
pip install 'dtex[gcs,s3]'                # add gs:// / s3:// filesystem reads
pip install 'dtex[gcp-secrets]'           # add the GCP Secret Manager resolver
pip install 'dtex[aws-secrets]'           # add the AWS Secrets Manager resolver
pip install 'dtex[vault]'                 # add the HashiCorp Vault resolver

pip install dtex ships every baked source and destination — DuckDB, BigQuery, the filesystem source's local + Parquet path, the REST / Postgres / ShipHero / Stripe sources, the engine, the CLI. Extras stay opt-in for the cloud-storage paths of the filesystem source (gs:// / s3://) and for secret managers (only relevant if your profiles.yml uses secret:// URLs).

dtex requires Python 3.11+. It installs both a CLI (dtex) and an importable library (import dtex).

Usage

dtex init my_project                      # scaffold a project
cd my_project
dtex new source my_api                    # scaffold a source connector
dtex new config my_pipeline               # scaffold a pipeline config
dtex validate                             # check everything
dtex run -p my_pipeline                   # run the pipeline
dtex runs list -p my_pipeline             # show recent run history

A pipeline is one config file binding a source + a destination + a target + params. Run it with dtex run -p <config>. The library equivalent is dtex.run(config="my_pipeline") and returns a structured RunResult.

Pre-baked connectors

Sources: filesystem (CSV/JSONL/Parquet from local, GCS, or S3), rest (paginated REST APIs — 4 pagination strategies, 4 auth modes), postgres (keyset pagination, no OFFSET), shiphero (GraphQL), stripe (resource-as-stream over the REST API).

Destinations: duckdb (zero-config dev default, all 5 capabilities) and bigquery (production warehouse — Parquet-staged via GCS + LOAD jobs, MERGE upserts, cursor-based partitioning).

Engine: per-stream commit + atomic transactions (rollback on failure), state in the destination's _dtex_state table, run records in _dtex_runs, structured JSON-lines logs per run, secret redaction, schema evolution (evolve default, strict opt-in), pipeline-level parallelism with per-destination caps.

Secret managers: GCP Secret Manager, AWS Secrets Manager, HashiCorp Vault — each as an opt-in extra.

Documentation

The full design handbook lives in docs/. Start with 00 — Vision & Naming, 02 — Architecture, 06 — Project Anatomy, 12 — Configs, and 10 — Roadmap and Scope.

Security · Contributing · Code of Conduct

License

Apache License 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dtex-0.1.2.tar.gz (440.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dtex-0.1.2-py3-none-any.whl (282.9 kB view details)

Uploaded Python 3

File details

Details for the file dtex-0.1.2.tar.gz.

File metadata

  • Download URL: dtex-0.1.2.tar.gz
  • Upload date:
  • Size: 440.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dtex-0.1.2.tar.gz
Algorithm Hash digest
SHA256 cbed186911a3f9909caabcb8ed38fcb8c87a037c17d6d45379ed4fca43dac2d3
MD5 f12e5bfb6f18a5374aa6a86fb92e4fd3
BLAKE2b-256 1b8dfda98ba4b83ee46ff2f60bd00ac91e6b1fecf8e638bbea7e8b3d4e62c417

See more details on using hashes here.

Provenance

The following attestation bundles were made for dtex-0.1.2.tar.gz:

Publisher: publish.yml on vej-ai/dtex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dtex-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: dtex-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 282.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dtex-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2a066d8932c46b37a84f4e8435ef2699536257935bbcdcf630ba2594f2607cbc
MD5 364b5c98b1f2c7e0e22dfbec9b3c0028
BLAKE2b-256 f06cd4dc534193f613d795f44b25ca999794e350f781a93d39c202f84ebf5ca9

See more details on using hashes here.

Provenance

The following attestation bundles were made for dtex-0.1.2-py3-none-any.whl:

Publisher: publish.yml on vej-ai/dtex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page