A Python package for working with parquet data lakes.
Project description
bearhouse
A toolkit for working date-partitioned Parquet data lakes.
Data Organization
Bearhouse expects data organized as date-partitioned Parquet files following this convention:
- File format:
{type}_{YYYYMMDD}.parquet - Auto-added column:
fn_date(type:Date) is automatically derived from the filename and added to every row
Example:
data/
├── events_20240101.parquet
├── events_20240102.parquet
├── metrics_20240101.parquet
└── metrics_20240102.parquet
Usage
Use bearhouse.execute() to run SQL queries directly against your Parquet files. The query's WHERE clause on the date column determines which files are loaded — only the relevant date range is read from disk.
import bearhouse
df = bearhouse.execute(
sql="SELECT * FROM events WHERE date >= '2024-01-01' AND date <= '2024-01-31'",
date_directory="/path/to/data"
)
It supports all standard SQL functionalities. The auto-added fn_date column is useful for joining tables across files from the same date:
SELECT e.id AS event_id, e.event_type, m.value_int, m.value_float, e.fn_date
FROM events e
JOIN metrics m ON e.id = m.id AND e.fn_date = m.fn_date
WHERE e.date BETWEEN '2026-03-01' AND '2026-03-02'
ORDER BY e.id
Supported date filter syntax
| Syntax | Example |
|---|---|
Range (>=, <=) |
WHERE date >= '2024-01-01' AND date <= '2024-03-31' |
Greater/less than (>, <) |
WHERE date > '2024-06-01' |
Exact date (=) |
WHERE date = '2024-12-25' |
BETWEEN |
WHERE date BETWEEN '2024-01-01' AND '2024-12-31' |
When no date bounds are specified, bearhouse defaults to 2000-01-01 through today.
Installation
pip install bearhouse
Requirements
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bearhouse-0.4.0.tar.gz.
File metadata
- Download URL: bearhouse-0.4.0.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c9c49e49076eb00f04fa41d0650c56e720fa3f8db5bb3792576339631eb7d70
|
|
| MD5 |
0a3bb5dcf2ec4b8d135391265304f1fb
|
|
| BLAKE2b-256 |
18333c1f0fecce47fdae3713b1d1e33c628fd88bcbbd0cd27d7aa810f73fc1d1
|
Provenance
The following attestation bundles were made for bearhouse-0.4.0.tar.gz:
Publisher:
publish.yml on jackxxu/bearhouse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bearhouse-0.4.0.tar.gz -
Subject digest:
2c9c49e49076eb00f04fa41d0650c56e720fa3f8db5bb3792576339631eb7d70 - Sigstore transparency entry: 1059752936
- Sigstore integration time:
-
Permalink:
jackxxu/bearhouse@b7058022d1a5a0ffea8dd541917d4059c82a6730 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jackxxu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b7058022d1a5a0ffea8dd541917d4059c82a6730 -
Trigger Event:
push
-
Statement type:
File details
Details for the file bearhouse-0.4.0-py3-none-any.whl.
File metadata
- Download URL: bearhouse-0.4.0-py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f6e84b35784d01ac9d1e23752c427c6ad42ce53eceba654e6229330dd7dbc96
|
|
| MD5 |
74822d02ec69855de0f8b444a8a6c1cd
|
|
| BLAKE2b-256 |
9d78ee9bab41ed0254d4a21f18bbaedf2d1185463a92bfccaebc6c7ffd8c45cc
|
Provenance
The following attestation bundles were made for bearhouse-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on jackxxu/bearhouse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bearhouse-0.4.0-py3-none-any.whl -
Subject digest:
0f6e84b35784d01ac9d1e23752c427c6ad42ce53eceba654e6229330dd7dbc96 - Sigstore transparency entry: 1059752939
- Sigstore integration time:
-
Permalink:
jackxxu/bearhouse@b7058022d1a5a0ffea8dd541917d4059c82a6730 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jackxxu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b7058022d1a5a0ffea8dd541917d4059c82a6730 -
Trigger Event:
push
-
Statement type: