A Python package for working with parquet data lakes.
Project description
bearhouse
A toolkit for working date-partitioned Parquet data lakes.
Data Organization
Bearhouse expects data organized as date-partitioned Parquet files following this convention:
- File format:
{type}_{YYYYMMDD}.parquet - Required column: each file must contain a
datecolumn of datetime type
Example:
data/
├── events_20240101.parquet
├── events_20240102.parquet
├── metrics_20240101.parquet
└── metrics_20240102.parquet
Usage
Use bearhouse.execute() to run SQL queries directly against your Parquet files. The query's WHERE clause on the date column determines which files are loaded — only the relevant date range is read from disk.
import bearhouse
df = bearhouse.execute(
sql="SELECT * FROM events WHERE date >= '2024-01-01' AND date <= '2024-01-31'",
date_directory="/path/to/data"
)
It supports all standard sql functionalities. Below is an example sql with joins:
SELECT e._index0_ as idx, e.id AS event_id, e.event_type, m.value_int, m.value_float, e.date
FROM events e
JOIN metrics m ON e._index0_ = m._index0_ AND e.date = m.date
WHERE e.date BETWEEN '2026-03-01' AND '2026-03-02'
ORDER BY e._index0_
Supported date filter syntax
| Syntax | Example |
|---|---|
Range (>=, <=) |
WHERE date >= '2024-01-01' AND date <= '2024-03-31' |
Greater/less than (>, <) |
WHERE date > '2024-06-01' |
Exact date (=) |
WHERE date = '2024-12-25' |
BETWEEN |
WHERE date BETWEEN '2024-01-01' AND '2024-12-31' |
When no date bounds are specified, bearhouse defaults to 2000-01-01 through today.
Installation
pip install bearhouse
Requirements
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bearhouse-0.3.0.tar.gz.
File metadata
- Download URL: bearhouse-0.3.0.tar.gz
- Upload date:
- Size: 2.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9261476f56eea6cf9726f1f094adef7822fa47a5e0e8f5aef1a4d2586fd5b83
|
|
| MD5 |
a25766d12a0f93df05e0c3f2d6afed31
|
|
| BLAKE2b-256 |
95446793bd40fab26def9f0df325f274823c248546ed662318d186c7c61aadb1
|
Provenance
The following attestation bundles were made for bearhouse-0.3.0.tar.gz:
Publisher:
publish.yml on jackxxu/bearhouse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bearhouse-0.3.0.tar.gz -
Subject digest:
f9261476f56eea6cf9726f1f094adef7822fa47a5e0e8f5aef1a4d2586fd5b83 - Sigstore transparency entry: 1059748061
- Sigstore integration time:
-
Permalink:
jackxxu/bearhouse@235607b997486a44451868ce42506836bb72b7bc -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jackxxu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@235607b997486a44451868ce42506836bb72b7bc -
Trigger Event:
push
-
Statement type:
File details
Details for the file bearhouse-0.3.0-py3-none-any.whl.
File metadata
- Download URL: bearhouse-0.3.0-py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22de1f0df8bdf4d37ee63edcdfa196644c0cf3bf2f72412d1ed4ee5e99f0b737
|
|
| MD5 |
b3401ee5f857d57943dc90f3605f131a
|
|
| BLAKE2b-256 |
ddcece8b0c6ac195d9d4f7847aea70ef6f3137548b350c1c0770268b5f230b7b
|
Provenance
The following attestation bundles were made for bearhouse-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on jackxxu/bearhouse
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bearhouse-0.3.0-py3-none-any.whl -
Subject digest:
22de1f0df8bdf4d37ee63edcdfa196644c0cf3bf2f72412d1ed4ee5e99f0b737 - Sigstore transparency entry: 1059748063
- Sigstore integration time:
-
Permalink:
jackxxu/bearhouse@235607b997486a44451868ce42506836bb72b7bc -
Branch / Tag:
refs/heads/main - Owner: https://github.com/jackxxu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@235607b997486a44451868ce42506836bb72b7bc -
Trigger Event:
push
-
Statement type: