Self-hosted data platform. A Nordic alternative to Databricks/Snowflake. DuckDB + SQL transforms + Python ingest/export. Data in safe waters.
Project description
Data in safe waters.
The open-source data platform that runs on your machine. DuckDB + SQL + Python. No cloud required.
Quick Start · Features · Why havn? · Docs · Contributing
License notice: havn is source-available under the Business Source License 1.1. You can read, run, modify, and use it for any internal or commercial purpose — including in production at your company or at client sites. The one restriction is that you may not offer havn to third parties as a competing hosted or managed service. Each release automatically converts to Apache 2.0 four years after its release date (the current release converts on 2030-04-05). See the License FAQ below for details.
havn (Danish/Norwegian for harbour) is a self-hosted data platform - a Nordic alternative to Databricks and Snowflake for teams that want analytics without the complexity, cost, or data leaving their infrastructure.
Your entire warehouse lives in a single DuckDB file. Transforms are plain SQL. Ingest and export scripts are Python. There's no Jinja, no compilation step, no profiles.yml, and no YAML spaghetti.
git clone https://github.com/chraltro/db.git && cd db && pip install -e . && cd frontend && npm install && npm run build && cd .. && havn init my-project && cd my-project && havn jobs run full-refresh && havn serve
Why havn?
Most data tools force a choice: powerful but complex (Databricks, Snowflake, dbt + Airflow) or simple but limited (CSVs in a folder).
havn gives you the analytical power of a modern data stack in something you can install in one command and run on a laptop.
| Pain point | havn's answer |
|---|---|
| Cloud costs spiraling | Runs locally. DuckDB on your machine. $0/month. |
| Data leaving your infrastructure | Self-hosted. Your data stays on your hardware. Full stop. |
| Jinja-templated SQL nobody understands | Plain SQL. Config is a comment. Dependencies are a comment. SQL is just SQL. |
| 30-minute onboarding | 30-second onboarding. Install from source and havn init gives you a working pipeline with sample data. |
| Separate tools for ingest, transform, orchestration, UI | One tool does it all. CLI, web UI, scheduler, connectors - included. |
| LLMs can't write your DSL | AI-native. Plain SQL + simple conventions = LLMs write correct transforms on the first try. |
Features
SQL Transform Engine
Write plain SQL with comment-based config. havn resolves dependencies, builds a DAG, and executes in the right order - with change detection that only rebuilds what changed.
-- config: materialized=table, schema=gold
-- depends_on: silver.customers, silver.orders
SELECT
c.customer_id,
c.name,
COUNT(o.order_id) AS order_count,
SUM(o.amount) AS lifetime_value
FROM silver.customers c
LEFT JOIN silver.orders o ON c.customer_id = o.customer_id
GROUP BY 1, 2
Web UI
Full-featured browser interface with Monaco code editor, interactive SQL runner, DAG visualization, data table browser, chart builder, and pipeline monitoring. Dark and light themes included.
havn serve # http://localhost:3000
havn serve --auth # with role-based access control
20+ Data Connectors
Connect to Postgres, MySQL, SQLite, Stripe, HubSpot, Google Sheets, S3, REST APIs, and more - from the CLI or the web UI.
havn connect postgres --host localhost --database mydb --user admin
havn connect stripe --api-key sk_live_xxx
havn connect csv --path /data/customers.csv
Notebooks
Interactive .dpnb notebooks with code cells, markdown, and inline results. Use them for exploration, or wire them into your pipeline as ingest/export steps.
Pipeline Orchestration
Define multi-step pipelines in project.yml. Schedule them with cron. Get webhook notifications on completion.
streams:
daily-refresh:
schedule: "0 6 * * *"
steps:
- ingest: [all]
- transform: [all]
- export: [all]
webhook_url: https://hooks.slack.com/...
Git Integration & CI
Track changes with havn diff, create snapshots with havn snapshot, and generate GitHub Actions workflows with havn ci generate that post data diff comments on PRs.
havn diff # what would change?
havn diff --against main # changes vs a branch
havn snapshot create before-deploy # save state
havn ci generate # create GitHub Actions workflow
AI-Native Design
Every project scaffolded with havn init includes LLM context files. Plain SQL + simple conventions means AI assistants write correct code on the first try.
havn context # generate project summary, paste into any AI chat
| Tool | Config file | Auto-included |
|---|---|---|
| Claude Code | CLAUDE.md |
Yes |
| Cursor | .cursorrules |
Yes |
| GitHub Copilot | .github/copilot-instructions.md |
Yes |
| Any LLM | havn context |
Yes |
Quick Start
Install
From PyPI:
pip install havn
From source (for development):
git clone https://github.com/chraltro/db.git
cd db
pip install -e ".[dev]"
cd frontend && npm install && npm run build && cd ..
Create a project
havn init my-project
cd my-project
This scaffolds a complete project with a sample pipeline that fetches earthquake data from the USGS API, transforms it through bronze/silver/gold layers, and exports a report.
Run the pipeline
havn jobs run full-refresh
Explore your data
havn serve # open web UI at localhost:3000
havn query "SELECT * FROM gold.earthquake_summary"
havn tables # list all tables
Architecture
my-project/
├── ingest/ Python scripts + notebooks that load raw data
│ └── earthquakes.dpnb
├── transform/
│ ├── bronze/ Light cleanup (type casting, dedup)
│ ├── silver/ Business logic (joins, aggregations)
│ └── gold/ Consumption-ready tables
├── export/ Python scripts that push data out
├── notebooks/ Interactive .dpnb notebooks
├── project.yml Pipelines, connections, schedules
├── .env Secrets (never committed)
└── warehouse.duckdb Your entire database, one file
Data flows through four schemas:
landing/ → bronze/ → silver/ → gold/
(raw) (cleaned) (modeled) (ready)
The warehouse is a single DuckDB file. Copy it, back it up, version it - it's just a file.
All Commands
| Command | Description |
|---|---|
havn init <name> |
Scaffold a new project |
havn jobs run <name> |
Run a full pipeline (ingest → transform → export) |
havn transform |
Build SQL models in dependency order |
havn run <script> |
Run a single ingest/export script or notebook |
havn query "<sql>" |
Run ad-hoc SQL queries |
havn tables |
List warehouse tables and views |
havn serve |
Start the web UI |
havn diff |
Preview what would change before running transforms |
havn lint |
Lint SQL files with SQLFluff |
havn history |
Show pipeline run log |
havn status |
Project health: git info, warehouse stats, last run |
havn validate |
Check project structure, config, and DAG for errors |
havn snapshot create |
Save a named snapshot of project + data state |
havn backup |
Back up the warehouse database |
havn connect <type> |
Set up a data connector |
havn watch |
Watch files and auto-rebuild on change |
havn schedule |
Start the cron scheduler |
havn checkpoint |
Smart git commit with auto-generated messages |
havn docs |
Generate markdown documentation from warehouse schema |
havn context |
Generate project summary for AI assistants |
havn ci generate |
Generate GitHub Actions workflow |
havn secrets list/set/delete |
Manage .env secrets |
havn users create/list/delete |
Manage platform users and roles |
Comparison
| havn | dbt + Airflow | Databricks | Snowflake | |
|---|---|---|---|---|
| Self-hosted | Yes | Partial | No | No |
| Setup time | 1 min | Hours | Hours | Hours |
| Monthly cost | $0 | $100s+ | $1000s+ | $1000s+ |
| SQL dialect | Plain SQL | Jinja SQL | Spark SQL | Snowflake SQL |
| Ingest built-in | Yes | No (need Airbyte etc.) | Yes | Yes |
| Web UI | Yes | Separate (Airflow UI) | Yes | Yes |
| Single-file database | Yes | No | No | No |
| AI-native | Yes | No | Partial | No |
| Data stays on your machine | Yes | Depends | No | No |
havn is the right choice when you want a complete data platform without the infrastructure overhead. It's not trying to replace Snowflake at 10TB scale - it's the best tool for teams working with data that fits on a single machine (which is most teams).
Documentation
- CLAUDE.md - Full technical reference (architecture, conventions, development workflow)
- CONTRIBUTING.md - How to contribute
havn docs- Auto-generate documentation from your warehouse schemahavn context- Generate a project summary to paste into any AI assistant
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Development setup
git clone https://github.com/chraltro/db.git
cd db
pip install -e ".[dev]"
cd frontend && npm install && npm run build && cd ..
pytest tests/
License
havn is licensed under the Business Source License 1.1. Each release automatically converts to the Apache License 2.0 four years after its release date — the current release converts on 2030-04-05.
BSL 1.1 is a source-available license created by MariaDB and used by projects like HashiCorp Terraform/Vault, Sentry, and CockroachDB. It keeps the full source public while protecting against commercial resale as a competing hosted service.
FAQ
- Can I use havn at my company for free? Yes. Install it, run it, use it in production. There are no restrictions on internal use — no user tiers, no seat counts, no "contact sales".
- Can I modify havn for my own needs? Yes. Fork it, change it, run your modified version internally. The only thing you can't do is sell the modified version as a hosted service.
- Can my consultancy deploy havn at client sites? Yes. Deploying and configuring havn for a client is a service, not hosting. The restriction is on offering havn itself as an ongoing hosted product.
- What exactly is forbidden? Taking havn and offering it to third parties as a paid hosted or managed service that competes with the licensor's commercial offerings.
- When does it become fully open source? Each release converts to Apache 2.0 four years after its release date. The current release converts on 2030-04-05.
- Can I contribute? Yes — see CONTRIBUTING.md. Contributions will require a Contributor License Agreement so they can be included in both the BSL core and any future commercial distribution.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file havn-0.2.10.tar.gz.
File metadata
- Download URL: havn-0.2.10.tar.gz
- Upload date:
- Size: 15.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6561364f4047e73c16a51652e247602db61e54cffcc2b79e7986061fed598050
|
|
| MD5 |
a432d3b3ad266c33e0b0ff3b65a1a56a
|
|
| BLAKE2b-256 |
1dccbe5f7681c8858fe8678ffff07e09a41bbe2dcc0bac9635b00bc41dd39ab7
|
Provenance
The following attestation bundles were made for havn-0.2.10.tar.gz:
Publisher:
publish.yml on chraltro/havn
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
havn-0.2.10.tar.gz -
Subject digest:
6561364f4047e73c16a51652e247602db61e54cffcc2b79e7986061fed598050 - Sigstore transparency entry: 1400486549
- Sigstore integration time:
-
Permalink:
chraltro/havn@0830a217bfe2728b0de5b44245914bb2eb8de4c1 -
Branch / Tag:
refs/tags/v0.2.10 - Owner: https://github.com/chraltro
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0830a217bfe2728b0de5b44245914bb2eb8de4c1 -
Trigger Event:
release
-
Statement type:
File details
Details for the file havn-0.2.10-py3-none-any.whl.
File metadata
- Download URL: havn-0.2.10-py3-none-any.whl
- Upload date:
- Size: 15.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab2aad904d99f85e6ca21b7fad7310d24c5a0cc140e3fb1cce580f494e22c3cb
|
|
| MD5 |
de03a363587927825dcc4e35d195692b
|
|
| BLAKE2b-256 |
458c300c62d648006852d17921828521ca758752b9a1e884c9bfca898cc1bc85
|
Provenance
The following attestation bundles were made for havn-0.2.10-py3-none-any.whl:
Publisher:
publish.yml on chraltro/havn
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
havn-0.2.10-py3-none-any.whl -
Subject digest:
ab2aad904d99f85e6ca21b7fad7310d24c5a0cc140e3fb1cce580f494e22c3cb - Sigstore transparency entry: 1400486600
- Sigstore integration time:
-
Permalink:
chraltro/havn@0830a217bfe2728b0de5b44245914bb2eb8de4c1 -
Branch / Tag:
refs/tags/v0.2.10 - Owner: https://github.com/chraltro
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0830a217bfe2728b0de5b44245914bb2eb8de4c1 -
Trigger Event:
release
-
Statement type: