dlt destination for Firebolt (staged Parquet + COPY INTO)
Project description
dlt-firebolt
Prototype dlt destination for Firebolt.
Loads dlt pipelines into Firebolt using filesystem staging (Parquet on S3) + COPY INTO, the same pattern as dlt's Snowflake and Redshift destinations.
Status
Spike complete. Hardening done; packaging and upstream prep in progress.
| Phase | What it proved |
|---|---|
| 1 | dlt → S3 Parquet → manual COPY INTO |
| 2 | Generic sqlalchemy destination is not viable on Firebolt |
| 3 | Native destination="firebolt" end-to-end |
| 4 | Append / merge / replace disposition scripts |
See SPIKE.md for spike notes.
License
Apache License 2.0 — see LICENSE.
Install
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env # fill in Firebolt + S3 creds
Or install dependencies only (no editable package):
pip install -r requirements.txt
pip install -r requirements-dev.txt
Quick start (Phase 3 demo)
Requires:
- Firebolt
CREATE LOCATIONfor your S3 bucket — setFIREBOLT_S3_LOCATION_NAMEto the location name (e.g.sprinto_s3) - HubSpot private app token in
.env(demo only) - AWS credentials for S3 staging
export AWS_PROFILE=your-profile
python phase3_hubspot_to_firebolt.py
Optional: copy .dlt/secrets.toml.example to .dlt/secrets.toml and run with DLT_USE_SECRETS=1.
Before running demos, validate credentials:
python check_firebolt_env.py
Disposition checks (Phase 4)
Run each command separately (do not paste inline comments):
python phase4_dispositions.py --mode merge
python phase4_dispositions.py --mode append
python phase4_dispositions.py --mode append
python phase4_dispositions.py --mode replace
For append, run the command twice and confirm the row count grows.
Verify in Firebolt (default dataset demo):
SELECT COUNT(*) FROM demo_hubspot_contacts;
Usage in a dlt pipeline
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).resolve().parent))
import dlt
from firebolt_dest.configuration import make_firebolt_pipeline
pipeline = make_firebolt_pipeline(
pipeline_name="my_pipeline",
dataset_name="my_dataset",
)
pipeline.run(my_resource(), loader_file_format="parquet")
Or with .dlt/secrets.toml:
pipeline = make_firebolt_pipeline(
pipeline_name="my_pipeline",
dataset_name="my_dataset",
from_secrets=True,
)
Tables land as {dataset}_{table} (e.g. my_dataset_orders).
Connection details from environment variables — see .env.example — or from .dlt/secrets.toml — see .dlt/secrets.toml.example.
Layout
firebolt_dest/ # destination implementation (fork Redshift COPY pattern)
factory.py # registers destination="firebolt"
client.py # COPY load jobs
sql_client.py # Firebolt SQLAlchemy client
copy_sql.py # COPY INTO SQL generation
configuration.py # credentials + S3 location config
phase1_*.py # spike: dlt → S3 only
phase2_*.py # spike: dialect smoke test
phase3_*.py # spike: full native destination demo
phase4_*.py # append / merge / replace disposition checks
.dlt/config.toml # non-sensitive dlt defaults (parquet loader)
.dlt/secrets.toml.example
tests/ # unit tests (no Firebolt connection)
Configuration
| Variable | Required | Description |
|---|---|---|
FIREBOLT_CLIENT_ID |
yes | Service account client ID |
FIREBOLT_CLIENT_SECRET |
yes | Service account secret |
FIREBOLT_ACCOUNT_NAME |
yes | Firebolt account name |
FIREBOLT_DATABASE |
yes | Target database |
FIREBOLT_ENGINE |
yes | Engine name |
FIREBOLT_S3_LOCATION_NAME |
yes* | Firebolt external location name (must match CREATE LOCATION; e.g. sprinto_s3) |
S3_BUCKET |
yes | Staging bucket |
S3_PREFIX |
no | Key prefix (default: dlt-landing) |
DLT_DATASET_NAME |
no | Demo dataset (default: demo) |
Credentials belong in .env (gitignored) or .dlt/secrets.toml (gitignored). See .dlt/secrets.toml.example.
Tests
pip install -r requirements-dev.txt
pytest
# Optional: live Firebolt + S3 (requires .env and AWS creds)
FIREBOLT_RUN_INTEGRATION=1 pytest -m integration -v
Roadmap
- Package as installable module (
pip install -e ./dlt-firebolt) - Config via env vars and
.dlt/secrets.toml(both live-tested) - Merge/append/replace dispositions (merge via delete-insert; replace via truncate-and-insert or insert-from-staging)
- Unit tests for COPY and merge SQL generation
- Integration test harness (env-gated)
- Destination README (dlt-style setup doc)
- PyPI publish (
pip install dlt-fireboltfrom PyPI) - Upstream PR to dlt or community listing
Related
Customer connector demos that consume this pattern live separately in sprinto-connectors (private).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dlt_firebolt-0.1.0.tar.gz.
File metadata
- Download URL: dlt_firebolt-0.1.0.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
259f6ec4eb154b4f1312284d2fa766a2acb058f665c0c743730db3427f022970
|
|
| MD5 |
6f3954f8d53afecc793556e0fa170359
|
|
| BLAKE2b-256 |
0eb0612a06a2d952eab309e8d6e4483912339d5da4d222e6e66f842e1ebda963
|
File details
Details for the file dlt_firebolt-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dlt_firebolt-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
452fb0a513af749490ccbe00c3cc8f0506d786821b3172669c7232466b91a9aa
|
|
| MD5 |
fd9513b96625179af36951aae191608c
|
|
| BLAKE2b-256 |
62d4ebbc73068f558633aa7b158df2ef6b52ac7d9aa1bf0d5b39cbef316d251f
|