Skip to main content

YAML-driven data ingestion framework. Define collectors, dcf handles the rest.

Project description

dcf

PyPI Python License

D.ata C.ollection F.ramework

uvx --from dcf-core dcf init

How it works

  1. Define a data collector in YAML — source, schema, cadence
  2. Run it with dcf run
  3. Query data from your data lake

Example

name: dcf_commits
namespace: github

source:
  type: http
  url: https://api.github.com/repos/zephschafer/dcf/commits
  method: GET
  params:
    - name: per_page
      type: integer
      value: 100
    - name: since
      type: string
    - name: until
      type: string
  schema:
    columns:
      - {name: sha,          path: sha,                type: string}
      - {name: author,       path: commit.author.name, type: string}
      - {name: message,      path: commit.message,     type: string}
      - {name: committed_at, path: commit.author.date, type: timestamp}

cadence:
  strategy: incremental
  primary_key: sha
  iterate:
    - type: date_range
      params: [since, until]
      start: "2024-01-01"
      end: today
      step: 30 days

deployment:
  schedule: "0 8 * * *"
uv run dcf run dcf_commits
uv run dcf query 'SELECT * FROM github.dcf_commits LIMIT 5'

Install

pip install dcf-core

The CLI command is dcf.


Quickstart

mkdir my-project && cd my-project
uvx --from dcf-core dcf init
uv sync
uv run dcf run dcf_commits
uv run dcf query 'SELECT * FROM github.dcf_commits'

dcf init creates pyproject.toml, project.yml, .gitignore, collectors/, and an example collector.


Contributing

git clone https://github.com/zephschafer/dcf && cd dcf && uv sync

Point a local project at your checkout:

[tool.uv.sources]
dcf-core = { path = "../dcf", editable = true }

To verify changes:

uv run dcf run dcf_commits
uv run dcf query 'SELECT * FROM github.dcf_commits'

Releasing: bump version in pyproject.toml and push to main — GitHub Actions publishes to PyPI automatically.


Package structure

dcf/
├── cli.py              Entry point (Typer)
├── config/
│   ├── models.py       Pydantic models for collector YAML
│   └── loader.py       YAML loading + env var resolution
├── engine/
│   ├── runner.py       Outer loop (iterate → fetch → project → write)
│   ├── fetcher.py      HTTP and Python source fetchers
│   ├── iterator.py     Date range and categorical iteration
│   ├── projector.py    Schema projection and path extraction
│   └── transforms.py   Column transforms
├── writer/
│   └── iceberg.py      Write strategies (incremental / append / full_refresh)
└── gcp/                GCP auth, provisioning, Terraform wrappers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcf_core-0.1.7.tar.gz (61.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcf_core-0.1.7-py3-none-any.whl (64.9 kB view details)

Uploaded Python 3

File details

Details for the file dcf_core-0.1.7.tar.gz.

File metadata

  • Download URL: dcf_core-0.1.7.tar.gz
  • Upload date:
  • Size: 61.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dcf_core-0.1.7.tar.gz
Algorithm Hash digest
SHA256 6283709dd0c07adffb15ae5cf428c57c324680eb0a9c63e0877e439a394a78cb
MD5 9801285db2c260c9c1bf7fa7a89f0a03
BLAKE2b-256 3802e8aed3310be2b392452d684ae74f1b5a7622c6061da11e709f3ecbd34326

See more details on using hashes here.

File details

Details for the file dcf_core-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: dcf_core-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 64.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dcf_core-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 5858c2ea8af61191b0d3daf494c71fb2e7087d0e73d666892d3eb9928bb6bc09
MD5 d0e4f7cf0aeff2105a359b2208b8ba09
BLAKE2b-256 c2a8a0bdb62a8a54faa9dda7e827f653b01c7a20b3d7571783fd5b4939e00539

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page