Skip to main content

A Python package for working with parquet data lakes.

Project description

bearhouse

A toolkit for working date-partitioned Parquet data lakes.

Data Organization

Bearhouse expects data organized as date-partitioned Parquet files following this convention:

  • File format: {type}_{YYYYMMDD}.parquet
  • Auto-added column: fn_date (type: Date) is automatically derived from the filename and added to every row

Example:

data/
├── events_20240101.parquet
├── events_20240102.parquet
├── metrics_20240101.parquet
└── metrics_20240102.parquet

Usage

Use bearhouse.execute() to run SQL queries directly against your Parquet files. The query's WHERE clause on the date column determines which files are loaded — only the relevant date range is read from disk.

import bearhouse

df = bearhouse.execute(
    sql="SELECT * FROM events WHERE date >= '2024-01-01' AND date <= '2024-01-31'",
    date_directory="/path/to/data"
)

It supports all standard SQL functionalities. The auto-added fn_date column is useful for joining tables across files from the same date:

SELECT e.id AS event_id, e.event_type, m.value_int, m.value_float, e.fn_date
FROM events e
JOIN metrics m ON e.id = m.id AND e.fn_date = m.fn_date
WHERE e.date BETWEEN '2026-03-01' AND '2026-03-02'
ORDER BY e.id

Supported date filter syntax

Syntax Example
Range (>=, <=) WHERE date >= '2024-01-01' AND date <= '2024-03-31'
Greater/less than (>, <) WHERE date > '2024-06-01'
Exact date (=) WHERE date = '2024-12-25'
BETWEEN WHERE date BETWEEN '2024-01-01' AND '2024-12-31'

When no date bounds are specified, bearhouse defaults to 2000-01-01 through today.

Installation

pip install bearhouse

Requirements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bearhouse-0.4.0.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bearhouse-0.4.0-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file bearhouse-0.4.0.tar.gz.

File metadata

  • Download URL: bearhouse-0.4.0.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bearhouse-0.4.0.tar.gz
Algorithm Hash digest
SHA256 2c9c49e49076eb00f04fa41d0650c56e720fa3f8db5bb3792576339631eb7d70
MD5 0a3bb5dcf2ec4b8d135391265304f1fb
BLAKE2b-256 18333c1f0fecce47fdae3713b1d1e33c628fd88bcbbd0cd27d7aa810f73fc1d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for bearhouse-0.4.0.tar.gz:

Publisher: publish.yml on jackxxu/bearhouse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bearhouse-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: bearhouse-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bearhouse-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f6e84b35784d01ac9d1e23752c427c6ad42ce53eceba654e6229330dd7dbc96
MD5 74822d02ec69855de0f8b444a8a6c1cd
BLAKE2b-256 9d78ee9bab41ed0254d4a21f18bbaedf2d1185463a92bfccaebc6c7ffd8c45cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for bearhouse-0.4.0-py3-none-any.whl:

Publisher: publish.yml on jackxxu/bearhouse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page