havn · PyPI

Self-hosted data platform. A Nordic alternative to Databricks/Snowflake. DuckDB + SQL transforms + Python ingest/export. Data in safe waters.

These details have not been verified by PyPI

Project links

Project description

havn
Data in safe waters.
The open-source data platform that runs on your machine. DuckDB + SQL + Python. No cloud required.

Quick Start · Features · Why havn? · Docs · Contributing

havn (Danish/Norwegian for harbour) is a self-hosted data platform - a Nordic alternative to Databricks and Snowflake for teams that want analytics without the complexity, cost, or data leaving their infrastructure.

Your entire warehouse lives in a single DuckDB file. Transforms are plain SQL. Ingest and export scripts are Python. There's no Jinja, no compilation step, no profiles.yml, and no YAML spaghetti.

git clone https://github.com/chraltro/db.git && cd db && pip install -e . && cd frontend && npm install && npm run build && cd .. && havn init my-project && cd my-project && havn stream full-refresh && havn serve

Why havn?

Most data tools force a choice: powerful but complex (Databricks, Snowflake, dbt + Airflow) or simple but limited (CSVs in a folder).

havn gives you the analytical power of a modern data stack in something you can install in one command and run on a laptop.

Pain point	havn's answer
Cloud costs spiraling	Runs locally. DuckDB on your machine. $0/month.
Data leaving your infrastructure	Self-hosted. Your data stays on your hardware. Full stop.
Jinja-templated SQL nobody understands	Plain SQL. Config is a comment. Dependencies are a comment. SQL is just SQL.
30-minute onboarding	30-second onboarding. Install from source and `havn init` gives you a working pipeline with sample data.
Separate tools for ingest, transform, orchestration, UI	One tool does it all. CLI, web UI, scheduler, connectors - included.
LLMs can't write your DSL	AI-native. Plain SQL + simple conventions = LLMs write correct transforms on the first try.

Features

SQL Transform Engine

Write plain SQL with comment-based config. havn resolves dependencies, builds a DAG, and executes in the right order - with change detection that only rebuilds what changed.

-- config: materialized=table, schema=gold
-- depends_on: silver.customers, silver.orders

SELECT
    c.customer_id,
    c.name,
    COUNT(o.order_id) AS order_count,
    SUM(o.amount)     AS lifetime_value
FROM silver.customers c
LEFT JOIN silver.orders o ON c.customer_id = o.customer_id
GROUP BY 1, 2

Web UI

Full-featured browser interface with Monaco code editor, interactive SQL runner, DAG visualization, data table browser, chart builder, and pipeline monitoring. Dark and light themes included.

havn serve          # http://localhost:3000
havn serve --auth   # with role-based access control

20+ Data Connectors

Connect to Postgres, MySQL, SQLite, Stripe, HubSpot, Google Sheets, S3, REST APIs, and more - from the CLI or the web UI.

havn connect postgres --host localhost --database mydb --user admin
havn connect stripe --api-key sk_live_xxx
havn connect csv --path /data/customers.csv

Notebooks

Interactive .dpnb notebooks with code cells, markdown, and inline results. Use them for exploration, or wire them into your pipeline as ingest/export steps.

Pipeline Orchestration

Define multi-step pipelines in project.yml. Schedule them with cron. Get webhook notifications on completion.

streams:
  daily-refresh:
    schedule: "0 6 * * *"
    steps:
      - ingest: [all]
      - transform: [all]
      - export: [all]
    webhook_url: https://hooks.slack.com/...

Git Integration & CI

Track changes with havn diff, create snapshots with havn snapshot, and generate GitHub Actions workflows with havn ci generate that post data diff comments on PRs.

havn diff                          # what would change?
havn diff --against main           # changes vs a branch
havn snapshot create before-deploy # save state
havn ci generate                   # create GitHub Actions workflow

AI-Native Design

Every project scaffolded with havn init includes LLM context files. Plain SQL + simple conventions means AI assistants write correct code on the first try.

havn context   # generate project summary, paste into any AI chat

Tool	Config file	Auto-included
Claude Code	`CLAUDE.md`	Yes
Cursor	`.cursorrules`	Yes
GitHub Copilot	`.github/copilot-instructions.md`	Yes
Any LLM	`havn context`	Yes

Quick Start

Install

From PyPI:

pip install havn

From source (for development):

git clone https://github.com/chraltro/db.git
cd db
pip install -e ".[dev]"
cd frontend && npm install && npm run build && cd ..

Create a project

havn init my-project
cd my-project

This scaffolds a complete project with a sample pipeline that fetches earthquake data from the USGS API, transforms it through bronze/silver/gold layers, and exports a report.

Run the pipeline

havn stream full-refresh

Explore your data

havn serve                              # open web UI at localhost:3000
havn query "SELECT * FROM gold.earthquake_summary"
havn tables                             # list all tables

Architecture

my-project/
├── ingest/              Python scripts + notebooks that load raw data
│   └── earthquakes.dpnb
├── transform/
│   ├── bronze/          Light cleanup (type casting, dedup)
│   ├── silver/          Business logic (joins, aggregations)
│   └── gold/            Consumption-ready tables
├── export/              Python scripts that push data out
├── notebooks/           Interactive .dpnb notebooks
├── project.yml          Pipelines, connections, schedules
├── .env                 Secrets (never committed)
└── warehouse.duckdb     Your entire database, one file

Data flows through four schemas:

landing/  →  bronze/  →  silver/  →  gold/
 (raw)      (cleaned)   (modeled)   (ready)

The warehouse is a single DuckDB file. Copy it, back it up, version it - it's just a file.

All Commands

Command	Description
`havn init <name>`	Scaffold a new project
`havn stream <name>`	Run a full pipeline (ingest → transform → export)
`havn transform`	Build SQL models in dependency order
`havn run <script>`	Run a single ingest/export script or notebook
`havn query "<sql>"`	Run ad-hoc SQL queries
`havn tables`	List warehouse tables and views
`havn serve`	Start the web UI
`havn diff`	Preview what would change before running transforms
`havn lint`	Lint SQL files with SQLFluff
`havn history`	Show pipeline run log
`havn status`	Project health: git info, warehouse stats, last run
`havn validate`	Check project structure, config, and DAG for errors
`havn snapshot create`	Save a named snapshot of project + data state
`havn backup`	Back up the warehouse database
`havn connect <type>`	Set up a data connector
`havn watch`	Watch files and auto-rebuild on change
`havn schedule`	Start the cron scheduler
`havn checkpoint`	Smart git commit with auto-generated messages
`havn docs`	Generate markdown documentation from warehouse schema
`havn context`	Generate project summary for AI assistants
`havn ci generate`	Generate GitHub Actions workflow
`havn secrets list/set/delete`	Manage .env secrets
`havn users create/list/delete`	Manage platform users and roles

Comparison

	havn	dbt + Airflow	Databricks	Snowflake
Self-hosted	Yes	Partial	No	No
Setup time	1 min	Hours	Hours	Hours
Monthly cost	$0	$100s+	$1000s+	$1000s+
SQL dialect	Plain SQL	Jinja SQL	Spark SQL	Snowflake SQL
Ingest built-in	Yes	No (need Airbyte etc.)	Yes	Yes
Web UI	Yes	Separate (Airflow UI)	Yes	Yes
Single-file database	Yes	No	No	No
AI-native	Yes	No	Partial	No
Data stays on your machine	Yes	Depends	No	No

havn is the right choice when you want a complete data platform without the infrastructure overhead. It's not trying to replace Snowflake at 10TB scale - it's the best tool for teams working with data that fits on a single machine (which is most teams).

Documentation

CLAUDE.md - Full technical reference (architecture, conventions, development workflow)
CONTRIBUTING.md - How to contribute
havn docs - Auto-generate documentation from your warehouse schema
havn context - Generate a project summary to paste into any AI assistant

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

# Development setup
git clone https://github.com/chraltro/db.git
cd db
pip install -e ".[dev]"
cd frontend && npm install && npm run build && cd ..
pytest tests/

License

MIT - use it however you want.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.19

May 15, 2026

0.2.18

May 3, 2026

0.2.17

Apr 29, 2026

0.2.16

Apr 29, 2026

0.2.15

Apr 29, 2026

0.2.14

Apr 29, 2026

0.2.13

Apr 29, 2026

0.2.12

Apr 29, 2026

0.2.10

Apr 29, 2026

0.2.9

Apr 28, 2026

0.2.8

Apr 28, 2026

0.2.7

Apr 25, 2026

0.2.6

Apr 25, 2026

0.2.5

Mar 29, 2026

0.2.4

Mar 29, 2026

This version

0.2.3

Mar 25, 2026

0.2.2

Mar 23, 2026

0.2.1

Mar 23, 2026

0.2.0

Mar 23, 2026

0.1.2

Mar 19, 2026

0.1.1

Mar 19, 2026

0.1.0

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

havn-0.2.3.tar.gz (15.4 MB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

havn-0.2.3-py3-none-any.whl (15.5 MB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file havn-0.2.3.tar.gz.

File metadata

Download URL: havn-0.2.3.tar.gz
Upload date: Mar 25, 2026
Size: 15.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for havn-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`a177b756bbc8beaf8cc479a18fd2d868f43d4fdad4efa21508ca267b7c72b6c0`
MD5	`7d8906989a798b8af73739ad6b955932`
BLAKE2b-256	`4376418499ffc170c982a5b3db3d957dccbad3032530431a3427a066a7872fa2`

See more details on using hashes here.

File details

Details for the file havn-0.2.3-py3-none-any.whl.

File metadata

Download URL: havn-0.2.3-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 15.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for havn-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`755b5a47674c6789cc7ad5c2228cdf70f766dfdf4f5c601532cb91470002e45a`
MD5	`181058a30f4712542edec00afd0f9d65`
BLAKE2b-256	`a826b10a2af9006a4a8f00608b6fac11abf7d531750d942de27e348a647f8690`

See more details on using hashes here.

havn 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Why havn?

Features

SQL Transform Engine

Web UI

20+ Data Connectors

Notebooks

Pipeline Orchestration

Git Integration & CI

AI-Native Design

Quick Start

Install

Create a project

Run the pipeline

Explore your data

Architecture

All Commands

Comparison

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes