Python library and CLI to build DuckDB catalogs from declarative YAML/JSON configs

These details have not been verified by PyPI

Project links

Project description

Duckalog

Duckalog is a Python library and CLI for building DuckDB catalogs from declarative YAML/JSON configuration files. A single config file describes your DuckDB database, attachments (other DuckDB files, SQLite, Postgres), Iceberg catalogs, and views over Parquet/Delta/Iceberg or attached tables.

The goal is to make DuckDB catalogs reproducible, versionable, and easy to apply in local workflows and automated pipelines.

Features

Config-driven catalogs – Define DuckDB views in YAML/JSON instead of scattering CREATE VIEW statements across scripts.
Multiple sources – Views over S3 Parquet, Delta Lake, Iceberg tables, and attached DuckDB/SQLite/Postgres databases.
Attachments & catalogs – Configure attachments and Iceberg catalogs in the same config and reuse them across views.
Safe credentials – Use environment variables (e.g. ${env:AWS_ACCESS_KEY_ID}) instead of embedding secrets.
CLI + Python API – Build catalogs from the command line or from Python code with the same semantics.

For a full product and technical description, see docs/PRD_Spec.md.

Installation

Requirements: Python 3.9 or newer

Install from PyPI

pip install duckalog

This installs the Python package and provides the duckalog CLI command.

Verify Installation

duckalog --help
duckalog --version

Alternative Installation Methods

Development installation:

git clone https://github.com/legout/duckalog.git
cd duckalog
pip install -e .

Using uv (recommended for development):

uv pip install duckalog

Quickstart

1. Create a minimal config

Create a file catalog.yaml:

version: 1

duckdb:
  database: catalog.duckdb
  pragmas:
    - "SET memory_limit='1GB'"

views:
  - name: users
    source: parquet
    uri: "s3://my-bucket/data/users/*.parquet"

2. Build the catalog via CLI

duckalog build catalog.yaml

This will:

Read catalog.yaml.
Connect to catalog.duckdb (creating it if necessary).
Apply pragmas.
Create or replace the users view.

3. Generate SQL instead of touching the DB

duckalog generate-sql catalog.yaml --output create_views.sql

create_views.sql will contain CREATE OR REPLACE VIEW statements for all views defined in the config.

4. Validate a config

duckalog validate catalog.yaml

This parses and validates the config (including env interpolation), without connecting to DuckDB.

Python API

The duckalog package exposes the same functionality as the CLI with convenience functions:

from duckalog import build_catalog, generate_sql, validate_config


# Build or update a catalog file in place
build_catalog("catalog.yaml")


# Generate SQL without executing it
sql = generate_sql("catalog.yaml")
print(sql)


# Validate config (raises ConfigError on failure)
validate_config("catalog.yaml")

You can also work directly with the Pydantic model:

from duckalog import load_config

config = load_config("catalog.yaml")
for view in config.views:
    print(view.name, view.source)

Configuration Overview

At a high level, configs follow this structure:

version: 1

duckdb:
  database: catalog.duckdb
  install_extensions: []
  load_extensions: []
  pragmas: []

attachments:
  duckdb:
    - alias: refdata
      path: ./refdata.duckdb
      read_only: true

  sqlite:
    - alias: legacy
      path: ./legacy.db

  postgres:
    - alias: dw
      host: "${env:PG_HOST}"
      port: 5432
      database: dw
      user: "${env:PG_USER}"
      password: "${env:PG_PASSWORD}"

iceberg_catalogs:
  - name: main_ic
    catalog_type: rest
    uri: "https://iceberg-catalog.internal"
    warehouse: "s3://my-warehouse/"
    options:
      token: "${env:ICEBERG_TOKEN}"

views:
  # Parquet view
  - name: users
    source: parquet
    uri: "s3://my-bucket/data/users/*.parquet"

  # Delta view
  - name: events_delta
    source: delta
    uri: "s3://my-bucket/delta/events"

  # Iceberg catalog-based view
  - name: ic_orders
    source: iceberg
    catalog: main_ic
    table: analytics.orders

  # Attached DuckDB view
  - name: ref_countries
    source: duckdb
    database: refdata
    table: reference.countries

  # Raw SQL view
  - name: vip_users
    sql: |
      SELECT *
      FROM users
      WHERE is_vip = TRUE

Environment variable interpolation

Any string value may contain ${env:VAR_NAME} placeholders. During load_config, these are resolved using os.environ["VAR_NAME"]. Missing variables cause a ConfigError.

Examples:

duckdb:
  pragmas:
    - "SET s3_access_key_id='${env:AWS_ACCESS_KEY_ID}'"
    - "SET s3_secret_access_key='${env:AWS_SECRET_ACCESS_KEY}'"

Contributing

We welcome contributions to duckalog! This section provides guidelines and instructions for contributing to the project.

Development Setup

Requirements: Python 3.9 or newer

Automated Version Management

This project uses automated version tagging to streamline releases. When you update the version in pyproject.toml and push to the main branch, the system automatically:

Extracts the new version from pyproject.toml
Validates semantic versioning format (X.Y.Z)
Compares with existing tags to prevent duplicates
Creates a Git tag in format v{version} (e.g., v0.1.0)
Triggers the existing publish.yml workflow to publish to PyPI

Simple Release Process:

# 1. Update version in pyproject.toml
sed -i 's/version = "0.1.0"/version = "0.1.1"/' pyproject.toml

# 2. Commit and push
git add pyproject.toml
git commit -m "bump: Update version to 0.1.1"
git push origin main

# 3. Automated tagging creates tag and triggers publishing
# Tag v0.1.1 is created automatically
# publish.yml workflow runs and publishes to PyPI

For detailed examples and troubleshooting, see:

Continuous Integration

Duckalog uses a streamlined GitHub Actions setup to keep CI predictable:

Tests workflow runs Ruff + mypy on Python 3.11 and executes pytest on Ubuntu for Python 3.9–3.12. If tests fail, the workflow fails—no auto-generated smoke tests.
Security workflow focuses on a curated set of scans: TruffleHog and GitLeaks for secrets, Safety + pip-audit for dependency issues, and Bandit + Semgrep for code-level checks. Heavy container or supply-chain scans run only when explicitly needed.
publish.yml builds sdist + wheel once on Python 3.11, validates artifacts with twine check, smoke-tests the wheel, and then reuses the artifacts for Test PyPI, PyPI, or dry-run scenarios. Release jobs rely on the Tests workflow’s status rather than re-running the full test matrix.

For local development, we recommend:

uv run ruff check src/ tests/ to run lint checks (CI treats these as required).
uv run ruff format src/ tests/ to auto-format code (CI runs ruff format --check in advisory mode).
uv run mypy src/duckalog to run type checks.

Using uv (recommended for development)

# Clone the repository
git clone https://github.com/legout/duckalog.git
cd duckalog

# Install in development mode
uv pip install -e .

Using pip

# Clone the repository
git clone https://github.com/legout/duckalog.git
cd duckalog

# Install in development mode
pip install -e .

Install development dependencies

# Using uv
uv pip install -e ".[dev]"

# Using pip
pip install -e ".[dev]"

Coding Standards

We follow the conventions documented in openspec/project.md:

Python Style: Follow PEP 8 with type hints on public functions and classes
Module Structure: Prefer small, focused modules over large monoliths
Configuration: Use Pydantic models as the single source of truth for config schemas
Architecture: Separate concerns between config, SQL generation, and engine layers
Naming: Use descriptive, domain-aligned names (e.g., AttachmentConfig, ViewConfig)
Testing: Keep core logic pure and testable; isolate I/O operations

Testing

We use pytest for testing. The test suite includes both unit and integration tests:

# Run all tests
pytest

# Run with coverage
pytest --cov=duckalog

# Run specific test file
pytest tests/test_config.py

Testing Strategy:

Unit tests: Config parsing, validation, and SQL generation
Integration tests: End-to-end catalog building with temporary DuckDB files
Deterministic tests: Avoid network dependencies unless explicitly required
Test-driven development: Add tests for new behaviors before implementation

Change Proposal Process

For significant changes, we use OpenSpec to manage proposals and specifications:

Create a change proposal: Use the OpenSpec CLI to create a new change
```
openspec new "your-change-description"
```
Define requirements: Write specs with clear requirements and scenarios in changes/<id>/specs/
Plan implementation: Break down the work into tasks in changes/<id>/tasks.md
Validate your proposal: Ensure it meets project standards
```
openspec validate <change-id> --strict
```
Implement and test: Work through the tasks sequentially

See openspec/project.md for detailed project conventions and the OpenSpec workflow.

Pull Request Guidelines

When submitting pull requests:

Branch naming: Use small, focused branches with the OpenSpec change-id (e.g., add-s3-parquet-support)
Commit messages:
- Keep spec changes (openspec/, docs/) and implementation changes (src/, tests/) clear
- Reference relevant OpenSpec change IDs in PR titles or first commit messages
PR description: Include a clear description of the change and link to relevant OpenSpec proposals
Testing: Ensure all tests pass and add new tests for new functionality
Review process: Be responsive to review feedback and address all comments

We prefer incremental, reviewable PRs over large multi-feature changes.

Getting Help

Project Documentation: See plan/PRD_Spec.md for the full product and technical specification
Project Conventions: Refer to openspec/project.md for detailed development guidelines
Issues: Report bugs or request features on GitHub Issues
Discussions: Join project discussions on GitHub Discussions

Thank you for contributing to duckalog! 🚀

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.3

May 2, 2026

0.6.2

Apr 30, 2026

0.6.1

Apr 30, 2026

0.6.0

Apr 28, 2026

0.5.0

Dec 18, 2025

0.4.1

Dec 6, 2025

0.4.0

Dec 5, 2025

0.3.2

Dec 3, 2025

0.3.1

Dec 3, 2025

0.3.0

Dec 3, 2025

0.2.4

Nov 27, 2025

0.2.3

Nov 27, 2025

0.2.0

Nov 21, 2025

This version

0.1.2

Nov 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckalog-0.1.2.tar.gz (41.8 kB view details)

Uploaded Nov 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

duckalog-0.1.2-py3-none-any.whl (29.2 kB view details)

Uploaded Nov 17, 2025 Python 3

File details

Details for the file duckalog-0.1.2.tar.gz.

File metadata

Download URL: duckalog-0.1.2.tar.gz
Upload date: Nov 17, 2025
Size: 41.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duckalog-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`04962304270effe4349b0a8ba047958bc3ccd33bf2715a38044f9c117f0dfb6a`
MD5	`add4a9455b4374a346ada50561b6fe09`
BLAKE2b-256	`68988336c0f05733332a964fc455a85f8318c26834713ade70cb2a0cfc07825f`

See more details on using hashes here.

File details

Details for the file duckalog-0.1.2-py3-none-any.whl.

File metadata

Download URL: duckalog-0.1.2-py3-none-any.whl
Upload date: Nov 17, 2025
Size: 29.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for duckalog-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d58dafa94183017855c498dd7faa6d5c1f94a810ad0d1708351a21a6bdcbc45`
MD5	`26c9c23acfa51e6d1c136f6e090df934`
BLAKE2b-256	`0c7881877d44599a93e651533d58a781e87533c913cc5f82e170d38b8e75ce8b`

See more details on using hashes here.

duckalog 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Duckalog

Features

Installation

Install from PyPI

Verify Installation

Alternative Installation Methods

Quickstart

1. Create a minimal config

2. Build the catalog via CLI

3. Generate SQL instead of touching the DB

4. Validate a config

Python API

Configuration Overview

Environment variable interpolation

Contributing

Development Setup

Automated Version Management

Continuous Integration

Using uv (recommended for development)

Using pip

Install development dependencies

Coding Standards

Testing

Change Proposal Process

Pull Request Guidelines

Getting Help

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes