Skip to main content

A Python package for working with parquet data lakes.

Project description

bearhouse

A toolkit for working date-partitioned Parquet data lakes.

Data Organization

Bearhouse expects data organized as date-partitioned Parquet files following this convention:

  • File format: {type}_{YYYYMMDD}.parquet
  • Required column: each file must contain a date column of datetime type

Example:

data/
├── events_20240101.parquet
├── events_20240102.parquet
├── metrics_20240101.parquet
└── metrics_20240102.parquet

Usage

Use bearhouse.execute() to run SQL queries directly against your Parquet files. The query's WHERE clause on the date column determines which files are loaded — only the relevant date range is read from disk.

import bearhouse

df = bearhouse.execute(
    sql="SELECT * FROM events WHERE date >= '2024-01-01' AND date <= '2024-01-31'",
    date_directory="/path/to/data"
)

It supports all standard sql functionalities. Below is an example sql with joins:

SELECT e._index0_ as idx, e.id AS event_id, e.event_type, m.value_int, m.value_float, e.date
FROM events e
JOIN metrics m ON e._index0_ = m._index0_ AND e.date = m.date
WHERE e.date BETWEEN '2026-03-01' AND '2026-03-02'
ORDER BY e._index0_

Supported date filter syntax

Syntax Example
Range (>=, <=) WHERE date >= '2024-01-01' AND date <= '2024-03-31'
Greater/less than (>, <) WHERE date > '2024-06-01'
Exact date (=) WHERE date = '2024-12-25'
BETWEEN WHERE date BETWEEN '2024-01-01' AND '2024-12-31'

When no date bounds are specified, bearhouse defaults to 2000-01-01 through today.

Installation

pip install bearhouse

Requirements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bearhouse-0.3.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bearhouse-0.3.0-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file bearhouse-0.3.0.tar.gz.

File metadata

  • Download URL: bearhouse-0.3.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bearhouse-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f9261476f56eea6cf9726f1f094adef7822fa47a5e0e8f5aef1a4d2586fd5b83
MD5 a25766d12a0f93df05e0c3f2d6afed31
BLAKE2b-256 95446793bd40fab26def9f0df325f274823c248546ed662318d186c7c61aadb1

See more details on using hashes here.

Provenance

The following attestation bundles were made for bearhouse-0.3.0.tar.gz:

Publisher: publish.yml on jackxxu/bearhouse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bearhouse-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: bearhouse-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bearhouse-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 22de1f0df8bdf4d37ee63edcdfa196644c0cf3bf2f72412d1ed4ee5e99f0b737
MD5 b3401ee5f857d57943dc90f3605f131a
BLAKE2b-256 ddcece8b0c6ac195d9d4f7847aea70ef6f3137548b350c1c0770268b5f230b7b

See more details on using hashes here.

Provenance

The following attestation bundles were made for bearhouse-0.3.0-py3-none-any.whl:

Publisher: publish.yml on jackxxu/bearhouse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page