Haute — Open-source pricing engine for insurance teams on Databricks
Project description
Haute
Open-source pricing engine for insurance teams on Databricks.
Build, visualise, and deploy pricing pipelines as Python code — with a browser-based GUI that stays in sync.
pip install haute
What is Haute?
Haute gives insurance pricing teams a code-first, GUI-friendly way to build rating pipelines. Write standard Python with Polars, see it instantly in a visual editor, and deploy to a live API with one command.
- Build pipelines in code or the GUI — both stay in sync
- Run the same pipeline for 1-row live quotes and million-row batch jobs
- Deploy to Databricks MLflow Model Serving with
haute deploy
Python code is always the source of truth. The GUI is a live, editable view.
Quick Start
1. Install
pip install haute
# For deployment to Databricks:
pip install haute[databricks]
2. Create a project
mkdir my_project && cd my_project
haute init
This scaffolds everything you need in the current directory:
haute.toml ← project & deploy config
.env.example ← Databricks credentials template
main.py ← starter pipeline
data/ ← your data files
test_quotes/ ← JSON payloads for pre-deploy testing
3. Write a pipeline
# main.py
import polars as pl
import haute
pipeline = haute.Pipeline("motor_pricing", description="Motor premium calculation")
@pipeline.node(path="data/policies.parquet", deploy_input=True)
def policies() -> pl.DataFrame:
"""Read policy data — this is the live API input."""
return pl.scan_parquet("data/policies.parquet")
@pipeline.node(external="models/freq.cbm", file_type="catboost", model_class="regressor")
def frequency_model(policies: pl.DataFrame) -> pl.DataFrame:
"""Predict claim frequency."""
df = policies.with_columns(
freq_pred=pl.Series(obj.predict(policies.select("Area", "VehPower", "DrivAge").to_numpy()))
)
return df
@pipeline.node
def calculate_premium(frequency_model: pl.DataFrame) -> pl.DataFrame:
"""Calculate the technical premium."""
return frequency_model.with_columns(
premium=(pl.col("freq_pred") * 500).round(2)
)
@pipeline.node(output=True)
def output(calculate_premium: pl.DataFrame) -> pl.DataFrame:
"""Final output returned by the API."""
return calculate_premium
4. Run it
haute run
5. Open the GUI
haute serve
This opens a browser-based visual editor where you can:
- Drag and drop nodes from a palette
- Connect them with edges to define data flow
- Write Polars code in each transform node
- Click any node to preview its output data
- Toggle API Input on a data source to mark it as the live input
- Hit Run to execute the full pipeline
- Hit Save to write back to
.py
6. Deploy
cp .env.example .env # fill in your Databricks credentials
haute deploy
That's it. Your pipeline is now a live API on Databricks Model Serving.
Deployment
Haute deploys your pipeline as an MLflow model on Databricks Model Serving. One command, no DevOps.
How it works
- Marks — you tag one data source as
deploy_input=True(the live API input) and one node asoutput=True(the API response) - Prunes — Haute traces backwards from the output node and deploys only the scoring path. Training data branches, sinks, and exploratory nodes are automatically excluded
- Bundles — model files (
.cbm,.pkl, etc.) and static data are packaged as MLflow artifacts - Validates — every JSON file in
test_quotes/is scored through the pruned pipeline before deployment. If anything fails, deployment is blocked - Deploys — the pipeline is logged as an MLflow pyfunc model and registered in the Model Registry
Configuration
All deploy settings live in haute.toml (committed to git):
[project]
name = "motor-pricing"
pipeline = "main.py"
[deploy]
target = "databricks"
model_name = "motor-pricing"
endpoint_name = "motor-pricing"
[deploy.databricks]
experiment_name = "/Shared/haute/motor-pricing"
catalog = "main"
schema = "pricing"
serving_workload_size = "Small"
serving_scale_to_zero = true
[test_quotes]
dir = "test_quotes"
Secrets go in .env (gitignored):
DATABRICKS_HOST=https://adb-xxxxx.azuredatabricks.net
DATABRICKS_TOKEN=your_token_here
Test quotes
Put JSON files in test_quotes/ with example requests. These are scored before every deploy:
[
{"IDpol": 99001, "VehPower": 7, "DrivAge": 42, "Area": "C", "VehBrand": "B12"}
]
Dry run
Validate everything without actually deploying:
haute deploy --dry-run
✓ Loaded config from haute.toml
✓ Parsed pipeline (12 nodes, 14 edges)
✓ Pruned to output ancestors (5 nodes)
✓ Collected 2 artifacts (freq.cbm, sev.cbm)
✓ Inferred input schema (10 columns)
✓ Test quotes: single_policy.json 1 rows ok (18ms)
✓ Test quotes: batch_policies.json 5 rows ok (24ms)
✓ Validation passed
Dry run complete — no model was deployed.
Calling the deployed API
curl -X POST https://<workspace>.databricks.net/serving-endpoints/motor-pricing/invocations \
-H "Authorization: Bearer $TOKEN" \
-d '{"dataframe_records": [{"Area": "A", "VehPower": 5, "DrivAge": 35}]}'
Key Concepts
Pipelines
A pipeline is a DAG of decorated Python functions. Each function is a node that takes DataFrames in and returns a DataFrame out.
pipeline = haute.Pipeline("my_pipeline")
@pipeline.node
def transform(read_data: pl.DataFrame) -> pl.DataFrame:
return read_data.filter(pl.col("age") > 25)
Edges
Edges define data flow. Function parameter names match upstream node names:
pipeline.connect("read_data", "transform")
pipeline.connect("transform", "output")
Fan-out / Fan-in
One node can feed multiple downstream nodes, and a node can receive multiple inputs:
@pipeline.node
def joined(claims: pl.DataFrame, exposure: pl.DataFrame) -> pl.DataFrame:
return claims.join(exposure, on="IDpol", how="left")
pipeline.connect("claims", "joined")
pipeline.connect("exposure", "joined")
Scoring
The same pipeline code works for batch and live scoring:
# Batch: run the full pipeline
result = pipeline.run()
# Live: score a single row (same code path as the deployed API)
row = pl.DataFrame({"Area": ["A"], "DrivAge": [35]})
prediction = pipeline.score(row)
Node Types
Data Source
Reads data from a file. No code needed — just configure the path.
@pipeline.node(path="data/policies.parquet", deploy_input=True)
def policies() -> pl.DataFrame:
return pl.scan_parquet("data/policies.parquet")
deploy_input=True— marks this source as the live API input for deployment- Supported formats: Parquet, CSV, JSON
Transform
The workhorse node. Write Polars code to filter, join, aggregate, or reshape data.
@pipeline.node
def frequency_set(policies: pl.DataFrame, claims: pl.DataFrame) -> pl.DataFrame:
return policies.join(claims, on="IDpol", how="left")
In the GUI, two shorthand syntaxes are available:
- Chain syntax — start with
.to chain off the first input:.filter(pl.col("Area") == "A").select("IDpol", "premium") - Expression syntax — reference multiple inputs by name:
policies.join(claims, on="IDpol", how="left")
External File
Load a model or config file, then use it in your code. The loaded object is available as obj.
@pipeline.node(external="models/freq.cbm", file_type="catboost", model_class="regressor")
def frequency_model(policies: pl.DataFrame) -> pl.DataFrame:
df = policies.with_columns(
freq_pred=pl.Series(obj.predict(policies.select("Area", "VehAge").to_numpy()))
)
return df
| Type | Extension | How it loads |
|---|---|---|
| Pickle | .pkl |
pickle.load() |
| JSON | .json |
json.load() |
| Joblib | .joblib |
joblib.load() |
| CatBoost | .cbm |
CatBoostClassifier / CatBoostRegressor |
Output
Marks the final node whose result becomes the API response:
@pipeline.node(output=True)
def output(calculate_premium: pl.DataFrame) -> pl.DataFrame:
return calculate_premium
Data Sink
Writes data to disk. Sinks are pass-through during normal runs — writing only happens when you click Write in the GUI.
@pipeline.node(sink="output/frequency.parquet", format="parquet")
def frequency_write(frequency_set: pl.DataFrame) -> pl.DataFrame:
return frequency_set
GUI
The visual editor runs in your browser at http://localhost:5173.
| Area | Description |
|---|---|
| Left palette | Drag node types onto the canvas |
| Center canvas | Visual DAG with drag, zoom, connect |
| Right panel | Configure the selected node |
| Bottom panel | Data preview — click any node to see its output |
Nodes marked deploy_input=True show a green API badge. Toggle it on/off in the node's config panel.
Code ↔ GUI sync
Everything round-trips:
- Edit in the GUI → saves back to
.py - Edit the
.pyin your text editor → GUI picks it up on next load - Custom imports, helper functions, and constants are preserved in both directions
Pipeline Imports & Helpers
Every pipeline starts with import polars as pl and import haute. Add extra imports or helper functions via the Imports button (⚙) in the GUI toolbar, or write them directly in the .py file between the standard imports and the first @pipeline.node.
import numpy as np
from catboost import CatBoostClassifier
DISCOUNT_RATE = 0.95
def apply_discount(df, col):
return df.with_columns(pl.col(col) * DISCOUNT_RATE)
CLI Reference
| Command | Description |
|---|---|
haute init |
Scaffold a new project in the current directory |
haute run [file] |
Execute a pipeline and print results |
haute serve |
Start the visual editor |
haute deploy [file] |
Deploy the pipeline as a live API |
haute deploy --dry-run |
Validate and score test quotes without deploying |
haute status [model] |
Check the status of a deployed model |
haute serve options
| Flag | Default | Description |
|---|---|---|
--host |
127.0.0.1 |
Host to bind to |
--port |
8000 |
Backend API port |
--no-browser |
off | Don't auto-open the browser |
haute deploy options
| Flag | Description |
|---|---|
--model-name |
Override model name from haute.toml |
--dry-run |
Validate and score test quotes without deploying |
Project Structure
After haute init, your project looks like:
haute.toml ← project & deploy config (committed)
.env.example ← Databricks credentials template (committed)
.env ← actual credentials (gitignored)
.gitignore
main.py ← pipeline code (source of truth)
main.haute.json ← GUI layout state (node positions)
data/ ← data files (.parquet, .csv)
test_quotes/ ← JSON payloads for pre-deploy validation
example.json
.pyfiles are the source of truth — diffable, reviewable, testable.haute.jsonfiles store GUI layout (node positions) — not execution logichaute.tomlis the single config file for project settings and deployment
Design Principles
- Code is the source of truth — the
.pyfile is the pipeline. The GUI is a view. - Same pipeline, every context — the same code runs for 1-row live quotes and million-row batch jobs.
- Real Python, real Polars — no proprietary formula language. Your skills transfer.
- Git-native — pipelines are plain files. Diff, review, branch, merge.
- One-command deploy —
haute deployhandles pruning, bundling, validation, and MLflow registration. - Testable — every node is a plain function.
pytestjust works.
Requirements
- Python >= 3.11
- For deployment:
pip install haute[databricks](adds MLflow + Databricks SDK) - Works on Linux, macOS, and Windows
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file haute-0.1.9.tar.gz.
File metadata
- Download URL: haute-0.1.9.tar.gz
- Upload date:
- Size: 8.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ddd26177752d34dedd977f2c37cbdd828c352dba427e2af96bc48f2848072b6c
|
|
| MD5 |
26b8742f222a363b960778255e574199
|
|
| BLAKE2b-256 |
b7d62ece2e5e65251a80e165c0783f5319a524578aa964e1d7df48b4d1e05424
|
File details
Details for the file haute-0.1.9-py3-none-any.whl.
File metadata
- Download URL: haute-0.1.9-py3-none-any.whl
- Upload date:
- Size: 632.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
303d95771f3ebc7cc1a9e596b8b968d7e20afb4ef6753aae7fc299986f0ef6c9
|
|
| MD5 |
dae7ede564f67b77ea0222f2a5c60193
|
|
| BLAKE2b-256 |
696302f3f6d0915e4a3efc9400086b61de2b9acce6991ecfe717765f798c3a44
|