Skip to main content

No project description provided

Project description

Bundlebase

Like Docker, but for data.

Documentation | PyPI | Issues

Features

  • Multiple Formats: Support for Parquet, CSV, JSON, and more
  • Version Control: Built-in commit system for data pipeline versioning
  • Python Native: Seamless async/sync Python API with type hints
  • High Performance: Rust-powered core with Apache Arrow columnar format
  • Fluent API: Chain operations with intuitive, readable syntax

Installation

pip install bundlebase

Quick Start

Async API

import bundlebase

# Create a new bundle and chain operations
c = await (bundlebase.create()
    .attach("data.parquet")
    .filter("age >= 18")
    .remove_column("ssn")
    .rename_column("fname", "first_name"))

# Convert to pandas
df = await c.to_pandas()

# Commit changes
await c.commit("Cleaned customer data")

Sync API

import bundlebase.sync as dc

# Same operations, no await needed
c = (dc.create()
    .attach("data.parquet")
    .filter("age >= 18")
    .remove_column("ssn")
    .rename_column("fname", "first_name"))

df = c.to_pandas()
c.commit("Cleaned customer data")

Streaming Large Datasets

Process data larger than RAM efficiently:

import bundlebase

# Stream batches instead of loading everything
c = await bundlebase.open("huge_dataset.parquet")

total_rows = 0
async for batch in bundlebase.stream_batches(c):
    # Each batch is ~100MB, not entire dataset
    total_rows += batch.num_rows
    # Memory is freed after each iteration

print(f"Processed {total_rows} rows")

Core Operations

Data Loading

c = await bundlebase.create()
c = c.attach("data.parquet")      # Parquet files
c = c.attach("data.csv")          # CSV files
c = c.attach("data.json")         # JSON files

Data Transformation

c = c.filter("active = true")              # Filter rows
c = c.select(["id", "name", "email"])      # Select columns
c = c.remove_column("temp_field")          # Remove columns
c = c.rename_column("old", "new")          # Rename columns
c = c.select("SELECT * FROM self WHERE ...") # SQL queries

Data Export

df = await c.to_pandas()    # → pandas DataFrame
df = await c.to_polars()    # → polars DataFrame
arr = await c.to_numpy()    # → NumPy array
data = await c.to_dict()    # → Python dict

Indexing

c = c.create_index("email")        # Create index for fast lookups
c = c.rebuild_index("email")       # Rebuild existing index

Joining

c = await bundlebase.create()
c = c.attach("customers.parquet")
c = c.join(
    "orders.parquet",
    left_on="customer_id",
    right_on="id",
    join_type="inner"
)

Development

Prerequisites

  • Rust (latest stable)
  • Python 3.9+
  • Poetry

Setup

# Install Python dependencies
poetry install

# Build Rust extension
maturin develop

# Run tests
cargo test              # Rust tests
poetry run pytest       # Python tests

Contributing

Contributions are welcome!

License

Distributed under the Apache 2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bundlebase-0.12.1.tar.gz (810.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bundlebase-0.12.1-cp314-cp314-win_amd64.whl (64.1 MB view details)

Uploaded CPython 3.14Windows x86-64

bundlebase-0.12.1-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (60.5 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

bundlebase-0.12.1-cp314-cp314-macosx_11_0_arm64.whl (55.2 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

bundlebase-0.12.1-cp313-cp313-win_amd64.whl (64.1 MB view details)

Uploaded CPython 3.13Windows x86-64

bundlebase-0.12.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (60.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

bundlebase-0.12.1-cp313-cp313-macosx_11_0_arm64.whl (55.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file bundlebase-0.12.1.tar.gz.

File metadata

  • Download URL: bundlebase-0.12.1.tar.gz
  • Upload date:
  • Size: 810.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bundlebase-0.12.1.tar.gz
Algorithm Hash digest
SHA256 3ce8c5019d50c109943bef126cd03f510c895f22a7fe566ee156601ae5f08a5f
MD5 5cb218c8633322bb4a170a732ee5b780
BLAKE2b-256 aa41a05a3c503a70a335f664c469fc9a175d68aebae7ed4da8731b1859e09063

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.1-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.1-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 4b290cc66bff69539f0fb4aea6dc540de0908a0146c8a3a9f4e6494336a0ebdb
MD5 286f6239af1f831e57d048155e75a972
BLAKE2b-256 d8369e7306ecca01e85dddacb9c32858e47938d582d0295ca2ca84a814d37527

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.1-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.1-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4317d9c78ffe86f95d0d950bd1379afca51219090e220a8eb55712eda519f817
MD5 7d8fbc639d89ee745998bed8b5383bdd
BLAKE2b-256 12d8ba9a60fa586c56ad2e36ea22c1113296991a3622fee83be4494fc059438d

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.1-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.1-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b98d70d7afe62117ae31e74c7aeed00198edbe4d9b4b4f6850c0b1a5e1652982
MD5 8bc76415dca982173651a26b98f5099b
BLAKE2b-256 afd20d665781e96cecf5370d7c2e640f185ac1db092f61feda81da61f282b355

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 491fcb0e9a6476cb4d7c0a9d87ec93aad2a77eba382826417ee83cc587aa8a4c
MD5 7f92b60952d98901f87ba477b8511f42
BLAKE2b-256 92e4485d5b4d5eb81028b681cb3c04bf75cab6d2ff0fd3fa2cd2e60d7303dbf3

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1d359ce47901602db2eb7b646f175b45c60657fcd5de858b2904345cf450fe06
MD5 e515c63b6103940617ae4c7d047426c6
BLAKE2b-256 a453867a18a6b075325317375ea33fd1c1906da2f35b330302d95c5890e121cf

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 824878b7a9d1079d07527909023a8725be6011a3cb193474948d53a528a49181
MD5 00bd0543b68a6099675a2247938a0677
BLAKE2b-256 62aaab6fb553ab342a24db0490d27ad88a14f06b6dc1462d40c4c2a164c58151

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page