Skip to main content

YAML-driven data ingestion framework. Define collectors, dcf handles the rest.

Project description

dcf

PyPI Python License

YAML-driven data ingestion. Define a collector — dcf fetches, projects, and writes it to your warehouse.

uvx --from dcf-core dcf init

How it works

  1. Define a collector in YAML — source, schema, cadence
  2. Run it with dcf run
  3. Data lands in your local warehouse (Parquet + DuckDB) or a GCS-backed lake

Sources: HTTP REST/CSV APIs · Python functions · Google Pub/Sub
Write strategies: incremental (upsert) · append · full_refresh
Deploy targets: local (Docker + Airflow) or GCP (Cloud Composer + Dataflow)
Claude integration: dcf mcp setup-desktop registers an MCP server so Claude can write and run collectors on your behalf.


Example

name: dcf_commits
namespace: github

source:
  type: http
  url: https://api.github.com/repos/zephschafer/dcf/commits
  method: GET
  params:
    - name: per_page
      type: integer
      value: 100
    - name: since
      type: string
    - name: until
      type: string
  schema:
    columns:
      - {name: sha,          path: sha,                type: string}
      - {name: author,       path: commit.author.name, type: string}
      - {name: message,      path: commit.message,     type: string}
      - {name: committed_at, path: commit.author.date, type: timestamp}

cadence:
  strategy: incremental
  primary_key: sha
  iterate:
    - type: date_range
      params: [since, until]
      start: "2024-01-01"
      end: today
      step: 30 days

deployment:
  schedule: "0 8 * * *"
uv run dcf run dcf_commits
uv run dcf query 'SELECT * FROM github.dcf_commits LIMIT 5'

Install

pip install dcf-core

The CLI command is dcf.


Quickstart

mkdir my-project && cd my-project
uvx --from dcf-core dcf init
uv sync
uv run dcf run dcf_commits
uv run dcf query 'SELECT * FROM github.dcf_commits'

dcf init creates pyproject.toml, project.yml, .gitignore, collectors/, and an example collector.


Contributing

git clone https://github.com/zephschafer/dcf && cd dcf && uv sync

Point a local project at your checkout:

[tool.uv.sources]
dcf-core = { path = "../dcf", editable = true }

To verify changes:

uv run dcf run dcf_commits
uv run dcf query 'SELECT * FROM github.dcf_commits'

Releasing: bump version in pyproject.toml and push to main — GitHub Actions publishes to PyPI automatically.


Package structure

dcf/
├── cli.py              Entry point (Typer)
├── config/
│   ├── models.py       Pydantic models for collector YAML
│   └── loader.py       YAML loading + env var resolution
├── engine/
│   ├── runner.py       Outer loop (iterate → fetch → project → write)
│   ├── fetcher.py      HTTP and Python source fetchers
│   ├── iterator.py     Date range and categorical iteration
│   ├── projector.py    Schema projection and path extraction
│   └── transforms.py   Column transforms
├── writer/
│   └── iceberg.py      Write strategies (incremental / append / full_refresh)
└── gcp/                GCP auth, provisioning, Terraform wrappers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcf_core-0.1.6.tar.gz (61.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcf_core-0.1.6-py3-none-any.whl (65.2 kB view details)

Uploaded Python 3

File details

Details for the file dcf_core-0.1.6.tar.gz.

File metadata

  • Download URL: dcf_core-0.1.6.tar.gz
  • Upload date:
  • Size: 61.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dcf_core-0.1.6.tar.gz
Algorithm Hash digest
SHA256 161dc088087af88d654eebf34572a0013e73b43c913254047361c9f994720f40
MD5 790eb36f3b40033f9f2891f0c8933190
BLAKE2b-256 d2695cfc6e3d17f7429826117353dd29a0c340e71ac347b35b0aaf3cb8fa7606

See more details on using hashes here.

File details

Details for the file dcf_core-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: dcf_core-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 65.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dcf_core-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 0fd71ad5d24cd7fa2fb216af243ba201d2431d7819aafa6f6d74c315f51924a3
MD5 bde11c52ae5820df054066d9241ca487
BLAKE2b-256 98e00992d2a8cd85cb094a4d212461056349c4f4775e10c9ad83c0bbafe5e493

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page