Skip to main content

YAML-driven data ingestion framework. Define collectors, dcf handles the rest.

Project description

dcf

PyPI Python License

YAML-driven data ingestion. Define a collector — dcf fetches, projects, and writes it to your warehouse.

uvx --from dcf-core dcf init

How it works

  1. Define a collector in YAML — source, schema, cadence
  2. Run it with dcf run
  3. Data lands in your local warehouse (Parquet + DuckDB) or a GCS-backed lake

Sources: HTTP REST/CSV APIs · Python functions · Google Pub/Sub
Write strategies: incremental (upsert) · append · full_refresh
Deploy targets: local (Docker + Airflow) or GCP (Cloud Composer + Dataflow)
Claude integration: dcf mcp setup-desktop registers an MCP server so Claude can write and run collectors on your behalf.


Example

name: dcf_commits
namespace: github

source:
  type: http
  url: https://api.github.com/repos/zephschafer/dcf/commits
  method: GET
  params:
    - name: per_page
      type: integer
      value: 100
  schema:
    columns:
      - {name: sha,          path: sha,                type: string}
      - {name: author,       path: commit.author.name, type: string}
      - {name: message,      path: commit.message,     type: string}
      - {name: committed_at, path: commit.author.date, type: timestamp}

cadence:
  strategy: incremental
  primary_key: sha

deployment:
  schedule: "0 8 * * *"
uv run dcf run dcf_commits
uv run dcf query 'SELECT * FROM github.dcf_commits LIMIT 5'

Install

pip install dcf-core

The CLI command is dcf.


Quickstart

mkdir my-project && cd my-project
uvx --from dcf-core dcf init
uv sync
uv run dcf run dcf_commits
uv run dcf query 'SELECT * FROM github.dcf_commits'

dcf init creates pyproject.toml, project.yml, .gitignore, collectors/, and an example collector.


Contributing

git clone https://github.com/zephschafer/dcf && cd dcf && uv sync

Point a local project at your checkout:

[tool.uv.sources]
dcf-core = { path = "../dcf", editable = true }

To verify changes:

uv run dcf run dcf_commits
uv run dcf query 'SELECT * FROM github.dcf_commits'

Releasing: bump version in pyproject.toml and push to main — GitHub Actions publishes to PyPI automatically.


Package structure

dcf/
├── cli.py              Entry point (Typer)
├── config/
│   ├── models.py       Pydantic models for collector YAML
│   └── loader.py       YAML loading + env var resolution
├── engine/
│   ├── runner.py       Outer loop (iterate → fetch → project → write)
│   ├── fetcher.py      HTTP and Python source fetchers
│   ├── iterator.py     Date range and categorical iteration
│   ├── projector.py    Schema projection and path extraction
│   └── transforms.py   Column transforms
├── writer/
│   └── iceberg.py      Write strategies (incremental / append / full_refresh)
└── gcp/                GCP auth, provisioning, Terraform wrappers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcf_core-0.1.5.tar.gz (61.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcf_core-0.1.5-py3-none-any.whl (65.0 kB view details)

Uploaded Python 3

File details

Details for the file dcf_core-0.1.5.tar.gz.

File metadata

  • Download URL: dcf_core-0.1.5.tar.gz
  • Upload date:
  • Size: 61.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dcf_core-0.1.5.tar.gz
Algorithm Hash digest
SHA256 52a97d6cec417068859ea22f86cd4cf3f4cb75613f175c15b5bbea3c151336f3
MD5 83fc15100e3abd61f1a8c39581a53d94
BLAKE2b-256 a1938184520759fb8ca1f9dec1fef3099ac4db42bc6c0396780ae83ac91fd314

See more details on using hashes here.

File details

Details for the file dcf_core-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: dcf_core-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 65.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dcf_core-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 43c58a2d34d171475fb737c8e7f062a845065f81c6996cbd553edfc70aafc2ef
MD5 d881fbe62bb3e6df265d1642c24598b1
BLAKE2b-256 ceb27add80e3c22d8c9ee37596cddf987401b87792b45692c53fbdddcbd6bc2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page