Skip to main content

Declarative ETL framework for YAML-driven data pipelines

Project description

dpone

PyPI Python CI Docs License

dpone is a Python ETL framework for declarative, YAML-driven data pipelines. It helps data teams describe sources, sinks, load strategies, dependencies, conventions, and operational checks as reusable configuration instead of one-off scripts.

The public package name, import name, GitHub repository name, and CLI name are all intentionally short: dpone.

Repository: https://github.com/PaulKov/dpone

Public install:

python -m pip install dpone
dpone --help

Documentation map

Start here if you are evaluating or operating dpone:

What dpone gives you

  • YAML manifests for single-process and batch ETL definitions.
  • Built-in DAG/dependency inspection for pipeline debugging.
  • Runtime abstractions for sources, sinks, connectors, state, reconciliation, and safe SQL logging.
  • Optional integrations for PostgreSQL, MSSQL/SQL Server, ClickHouse, BigQuery/GCS, Kafka, pandas, Google Ads, and HashiCorp Vault.
  • A CLI designed for self-service validation, rendering, explainability, and documentation checks.
  • Compatibility shims for older import paths while the canonical package layout continues to stabilize.

Installation

Install the core package from PyPI:

pip install dpone
dpone --version
dpone -v

Install common extras for local ETL development:

pip install "dpone[postgres,mssql,clickhouse,kafka,gcp,s3,azure,pandas,vault]"

Install everything currently published by the project:

pip install "dpone[full]"

With uv:

uv add dpone
uv add "dpone[full]"

Optional extras

Extra Purpose
postgres PostgreSQL connectivity via psycopg
mssql Microsoft SQL Server connectivity via pyodbc; production bulk paths use external ODBC Driver 18 and bcp
clickhouse ClickHouse connectivity
gcp Google BigQuery and Google Cloud Storage support
s3 AWS S3 object storage staging support via boto3
azure Azure Blob Storage staging support via azure-storage-blob
object_storage S3, GCS, and Azure Blob object storage staging support
kafka Kafka batch source/sink support via confluent-kafka, Schema Registry codecs, Avro, JSON Schema, and Protobuf helpers
pandas DataFrame-based extract/load helpers
vault HashiCorp Vault integration via public vault-kv-client
google_ads Google Ads API support
full All public extras above

Vault support uses vault-kv-client, published on PyPI as vault-kv-client. New code should use vault_kv_client; the historical vault_client import path remains supported by that package as a compatibility layer.

Credentials can come from environment variables, Airflow Connections, HashiCorp Vault-compatible KV, or inline params for smoke tests. See Connections and credentials for copy-paste examples for Postgres, MSSQL, ClickHouse, BigQuery, Kafka, and REST API.

Quick start

Create a batch manifest, for example examples/batch/landing_postgres_to_bq.batch.yaml:

# yaml-language-server: $schema=../../src/dpone/schema/etl-batch-manifest.schema.json
kind: dpone.batch.v1
convention: landing_raw_v1
registry: ../registry/sources.yaml

vars:
  src_system: demo_source
  src_database: demo_db
  owner_team: data-platform
  owner_contact: data-platform@example.com
  sla: daily

defaults:
  source:
    type: postgres
    connection_type: vault
    connection_id: postgres-demo
    vault_path: postgres/demo-source
    options:
      batch_size: 100000
      export_format: csv

  sink:
    type: bigquery
    connection_type: vault
    connection_id: bigquery-demo
    vault_path: gcp/demo-project-prod/bq/service-account
    staging:
      schema: stg
    strategy:
      mode: full_refresh
      overwrite_type: exchange

schemas:
  public:
    tables:
      - core_city

Validate and render it:

dpone manifest validate examples/batch/landing_postgres_to_bq.batch.yaml \
  --profile landing_raw_v1 \
  --registry examples/registry/sources.yaml

dpone manifest render examples/batch/landing_postgres_to_bq.batch.yaml \
  --selector public.core_city \
  --registry examples/registry/sources.yaml

Inspect pipeline dependencies:

dpone dag report examples/batch/landing_postgres_to_bq.batch.yaml \
  --base-path . \
  --format json \
  --preset ci \
  --registry examples/registry/sources.yaml

CLI overview

dpone --help
dpone manifest --help
dpone dag --help
dpone docs --help

Common commands:

dpone manifest list examples/batch/landing_postgres_to_bq.batch.yaml
dpone manifest validate examples/batch/landing_postgres_to_bq.batch.yaml --recursive
dpone manifest render examples/batch/landing_postgres_to_bq.batch.yaml --selector public.core_city
dpone manifest explain examples/batch/landing_postgres_to_bq.batch.yaml --selector public.core_city --why sink.table.schema
dpone dag list-edges examples/batch/landing_postgres_to_bq.batch.yaml --with-groups --with-refs
dpone dag explain-node examples/batch/landing_postgres_to_bq.batch.yaml --task public.core_city
dpone dag report examples/batch/landing_postgres_to_bq.batch.yaml --preset ci --format md

Repository layout

src/dpone/      Python package source code
docs/           User and developer documentation
examples/       Public example manifests and registries
tests/          Unit and integration tests
tools/          Local smoke and release helper scripts

Canonical imports live under:

  • dpone.manifest.*
  • dpone.dag.*
  • dpone.runtime.*
  • dpone.contracts.*
  • dpone.ports.*
  • dpone.adapters.*

Legacy paths such as dpone.core.*, dpone.lib.*, dpone.source.*, and dpone.sink.* are compatibility shims. Prefer canonical imports for new code.

Local development

uv sync --all-extras
uv run ruff check .
uv run ruff format --check .
uv run mypy --config-file mypy.ini
uv run pytest -m "not integration_live"

Build package artifacts:

uv build

Run the package smoke script from an installed environment:

python tools/package_smoke.py --project-root . --dpone-cmd dpone

CI and releases

The OSS repository uses GitHub Actions as the primary automation path. See CI/CD for the workflow map, detailed runbooks, artifacts, and developer guidance.

Key workflows:

  • .github/workflows/ci.yml runs linting, formatting checks, type checks, tests, coverage, package build, and PostgreSQL XMin integration.
  • .github/workflows/pages.yml builds and deploys the GitHub Pages documentation site from master.
  • .github/workflows/release.yml builds and publishes tagged releases to PyPI.
  • .github/workflows/integration-matrix.yml and .github/workflows/connector-certification.yml provide manual/scheduled production-confidence gates.

Release tags use the format vX.Y.Z, for example:

git tag -a vX.Y.Z -m "Release vX.Y.Z"
git push origin vX.Y.Z

Prefer PyPI Trusted Publishing for releases. Token-based publishing should only be used as a fallback with short-lived, scoped tokens.

Security

Never commit API tokens, PyPI tokens, GitHub tokens, Vault credentials, service-account JSON, or live vendor credentials. If a secret is ever pasted into an issue, chat, commit, or CI log, revoke it before publishing or pushing public history.

See Security policy for the vulnerability reporting process.

License

dpone is licensed under the Apache License 2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpone-0.4.1.tar.gz (761.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dpone-0.4.1-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file dpone-0.4.1.tar.gz.

File metadata

  • Download URL: dpone-0.4.1.tar.gz
  • Upload date:
  • Size: 761.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dpone-0.4.1.tar.gz
Algorithm Hash digest
SHA256 d0d8a2b85f1c20836172242ac9db0b31952248a601acd8e823309c392ba1a2cf
MD5 69298b83831cba7b08c83769ba5b6d5c
BLAKE2b-256 b773a4d5d10130dd52b566dc74a0c855f2bc2ce0daefa906af6442109b5ece33

See more details on using hashes here.

Provenance

The following attestation bundles were made for dpone-0.4.1.tar.gz:

Publisher: release.yml on PaulKov/dpone

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dpone-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: dpone-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dpone-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1312fb4f6d53f3c0d48416a4535189bec60d7062481b8c93abc8738dbe6385ff
MD5 364bd3d2ff28ff662a6afd71b1291ffe
BLAKE2b-256 e87413ba750bcd4c33af63a446832531f05494aa0e740b046c1c63de784c269f

See more details on using hashes here.

Provenance

The following attestation bundles were made for dpone-0.4.1-py3-none-any.whl:

Publisher: release.yml on PaulKov/dpone

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page