A lightweight, file-based database built on Polars and Parquet, designed for fast analytics and easy data management.

Project description

Bear Lake

A lightweight, file-based database built on Polars and Parquet, designed for fast analytics and easy data management.

Bear Lake provides a simple API for creating partitioned tables, inserting data, and running efficient queries using Polars' lazy evaluation. All data is stored as Parquet files with automatic partitioning support.

Installation

You can install bear-lake using pip.

pip install bear-lake

Usage

Quick Start

import polars as pl
import bear_lake as bl

# Connect to database
db = bl.connect("my_database")

# Create a table with schema and partitioning
schema = {
    "date": pl.Date,
    "ticker": pl.String,
    "price": pl.Float64
}

db.create(
    name="stocks",
    schema=schema,
    partition_keys=["ticker"],
    primary_keys=["date", "ticker"],
    mode="error"
)

# Insert data
data = pl.DataFrame({
    "date": ["2024-01-01", "2024-01-02"],
    "ticker": ["AAPL", "AAPL"],
    "price": [150.0, 152.5]
})

db.insert("stocks", data, mode="append")

# Query data using Polars lazy evaluation
result = db.query(
    bl.table("stocks")
    .filter(pl.col("ticker") == "AAPL")
    .select(["date", "price"])
)

print(result)

S3 Storage

Bear Lake supports storing your database on S3-compatible storage (AWS S3, MinIO, etc.) by providing storage options when connecting.

Configuration

First, set up your S3 credentials as environment variables:

export ACCESS_KEY_ID="your-access-key"
export SECRET_ACCESS_KEY="your-secret-key"
export REGION="us-east-1"
export ENDPOINT="https://s3.amazonaws.com"  # Optional
export BUCKET="your-bucket-name"

Connecting to S3

import polars as pl
import bear_lake as bl
import os

# Configure storage options
storage_options = {
    'aws_access_key_id': os.getenv("ACCESS_KEY_ID"),
    'aws_secret_access_key': os.getenv("SECRET_ACCESS_KEY"),
    'region': os.getenv("REGION"),
    'endpoint_url': os.getenv("ENDPOINT")  # Optional
}

# Connect to S3 database
url = f"s3://{os.getenv('BUCKET')}"
db = bl.connect(path=url, storage_options=storage_options)

Usage with S3

Once connected, all database operations work identically to local storage:

# Create table
schema = {
    "date": pl.Date,
    "ticker": pl.String,
    "close": pl.Float64
}

db.create(
    name="stock_prices",
    schema=schema,
    partition_keys=["ticker"],
    primary_keys=["date", "ticker"],
    mode="replace"
)

# Insert data
data = pl.DataFrame({
    "date": ["2024-01-01", "2024-01-02"],
    "ticker": ["AAPL", "AAPL"],
    "close": [150.0, 152.5]
})

db.insert("stock_prices", data, mode="append")

# Query data (works the same as local storage)
result = db.query(
    bl.table("stock_prices")
    .filter(pl.col("ticker") == "AAPL")
    .select(["date", "close"])
)

All operations including insert, query, delete, optimize, and metadata operations work seamlessly with S3 storage.

API Reference

Database Connection

db = bl.connect(path: str) -> Database

Connect to a database at the specified path. Creates the directory if it doesn't exist.

Creating Tables

db.create(
    name: str,
    schema: dict[str, pl.DataType],
    partition_keys: list[str],
    primary_keys: list[str],
    mode: str = "error"
)

Parameters:

name: Table name
schema: Dictionary mapping column names to Polars data types
partition_keys: Columns to partition data by (creates hierarchical folder structure)
primary_keys: Columns that form a unique identifier (used for deduplication)
mode: How to handle existing tables - "error" (default), "replace", or "skip"

Inserting Data

db.insert(name: str, data: pl.DataFrame, mode: str = "append")

Parameters:

name: Table name
data: Polars DataFrame to insert
mode: How to handle existing partitions - "append" (default), "overwrite", or "error"

Querying Data

result = db.query(expression: pl.LazyFrame) -> pl.DataFrame

Execute a lazy Polars query and return results. Use bl.table(name) to get a LazyFrame for a table.

# Get a LazyFrame for querying
lazy_df = bl.table("stocks")

# Build query with Polars operations
result = db.query(
    lazy_df
    .filter(pl.col("date") > "2024-01-01")
    .group_by("ticker")
    .agg(pl.col("price").mean())
)

Deleting Data

db.delete(name: str, expression: pl.Expr)

Delete rows matching the given expression from all partitions.

# Delete all rows where ticker is AAPL
db.delete("stocks", pl.col("ticker") == "AAPL")

Dropping Tables

db.drop(name: str)

Remove a table and all its data.

Table Metadata

# List all tables
tables = db.list_tables() -> list[str]

# Get table schema
schema = db.get_schema(name: str) -> dict[str, pl.DataType]

# Get partition keys
partition_keys = db.get_partition_keys(name: str) -> list[str]

# Get primary keys
primary_keys = db.get_primary_keys(name: str) -> list[str]

Optimizing Tables

db.optimize(name: str)

Deduplicate rows based on primary keys (keeping the last occurrence) and sort data. This compacts storage and improves query performance.

Project details

Release history Release notifications | RSS feed

0.1.5

Jan 5, 2026

This version

0.1.4

Jan 5, 2026

0.1.3

Jan 5, 2026

0.1.2

Jan 5, 2026

0.1.0

Jan 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bear_lake-0.1.4.tar.gz (16.3 kB view details)

Uploaded Jan 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bear_lake-0.1.4-py3-none-any.whl (9.2 kB view details)

Uploaded Jan 5, 2026 Python 3

File details

Details for the file bear_lake-0.1.4.tar.gz.

File metadata

Download URL: bear_lake-0.1.4.tar.gz
Upload date: Jan 5, 2026
Size: 16.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bear_lake-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`3e3e1ed6fecddf72cc99fe8640068da2156af6c46804a63052322964415573ee`
MD5	`22ee159129b231cd5f2a628250fedb84`
BLAKE2b-256	`b23bf173321f1ae48c308b6aa782a11d16650a556f9e450bf33257734f240d71`

See more details on using hashes here.

File details

Details for the file bear_lake-0.1.4-py3-none-any.whl.

File metadata

Download URL: bear_lake-0.1.4-py3-none-any.whl
Upload date: Jan 5, 2026
Size: 9.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bear_lake-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9f4d968ba32a5cd672a45db112b7cff869c213558926862cb54ab2530f0d73fa`
MD5	`5e40cb06e009107e2a0e0ab8c19a77cd`
BLAKE2b-256	`870c3e91eff4680045f976022a735b203872543a72a71e1e9a34dc756ecac39f`

See more details on using hashes here.

bear-lake 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Bear Lake

Installation

Usage

Quick Start

S3 Storage

Configuration

Connecting to S3

Usage with S3

API Reference

Database Connection

Creating Tables

Inserting Data

Querying Data

Deleting Data

Dropping Tables

Table Metadata

Optimizing Tables

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes