Skip to main content

No project description provided

Project description

Bundlebase

Like Docker, but for data.

Documentation | PyPI | Issues

Features

  • Multiple Formats: Support for Parquet, CSV, JSON, and more
  • Version Control: Built-in commit system for data pipeline versioning
  • Python Native: Seamless async/sync Python API with type hints
  • High Performance: Rust-powered core with Apache Arrow columnar format
  • Fluent API: Chain operations with intuitive, readable syntax

Installation

pip install bundlebase

Quick Start

Async API

import bundlebase

# Create a new bundle and chain operations
c = await (bundlebase.create()
    .attach("data.parquet")
    .filter("age >= 18")
    .remove_column("ssn")
    .rename_column("fname", "first_name"))

# Convert to pandas
df = await c.to_pandas()

# Commit changes
await c.commit("Cleaned customer data")

Sync API

import bundlebase.sync as dc

# Same operations, no await needed
c = (dc.create()
    .attach("data.parquet")
    .filter("age >= 18")
    .remove_column("ssn")
    .rename_column("fname", "first_name"))

df = c.to_pandas()
c.commit("Cleaned customer data")

Streaming Large Datasets

Process data larger than RAM efficiently:

import bundlebase

# Stream batches instead of loading everything
c = await bundlebase.open("huge_dataset.parquet")

total_rows = 0
async for batch in bundlebase.stream_batches(c):
    # Each batch is ~100MB, not entire dataset
    total_rows += batch.num_rows
    # Memory is freed after each iteration

print(f"Processed {total_rows} rows")

Core Operations

Data Loading

c = await bundlebase.create()
c = c.attach("data.parquet")      # Parquet files
c = c.attach("data.csv")          # CSV files
c = c.attach("data.json")         # JSON files

Data Transformation

c = c.filter("active = true")              # Filter rows
c = c.select(["id", "name", "email"])      # Select columns
c = c.remove_column("temp_field")          # Remove columns
c = c.rename_column("old", "new")          # Rename columns
c = c.select("SELECT * FROM self WHERE ...") # SQL queries

Data Export

df = await c.to_pandas()    # → pandas DataFrame
df = await c.to_polars()    # → polars DataFrame
arr = await c.to_numpy()    # → NumPy array
data = await c.to_dict()    # → Python dict

Indexing

c = c.create_index("email")        # Create index for fast lookups
c = c.rebuild_index("email")       # Rebuild existing index

Joining

c = await bundlebase.create()
c = c.attach("customers.parquet")
c = c.join(
    "orders.parquet",
    left_on="customer_id",
    right_on="id",
    join_type="inner"
)

Development

Prerequisites

  • Rust (latest stable)
  • Python 3.9+
  • Poetry

Setup

# Install Python dependencies
poetry install

# Build Rust extension
maturin develop

# Run tests
cargo test              # Rust tests
poetry run pytest       # Python tests

Contributing

Contributions are welcome!

License

Distributed under the Apache 2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bundlebase-0.12.0.tar.gz (810.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bundlebase-0.12.0-cp314-cp314-win_amd64.whl (64.1 MB view details)

Uploaded CPython 3.14Windows x86-64

bundlebase-0.12.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (60.5 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

bundlebase-0.12.0-cp314-cp314-macosx_11_0_arm64.whl (55.2 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

bundlebase-0.12.0-cp313-cp313-win_amd64.whl (64.1 MB view details)

Uploaded CPython 3.13Windows x86-64

bundlebase-0.12.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (60.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

bundlebase-0.12.0-cp313-cp313-macosx_11_0_arm64.whl (55.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file bundlebase-0.12.0.tar.gz.

File metadata

  • Download URL: bundlebase-0.12.0.tar.gz
  • Upload date:
  • Size: 810.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bundlebase-0.12.0.tar.gz
Algorithm Hash digest
SHA256 18ee7a07f8d18a1edb298d220a9c0fa33c319c600108231a96dc8d72bb3360c5
MD5 207e303b3aa4ca34ddf96da6b5851983
BLAKE2b-256 9d4a5614208ab8713951a47bc8e5021b4f81027a9ec08de48df81ac453047fe5

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.0-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 354d453bde167928d6da59493fc0500cb3108e4a8ad7287c74aa27ebdde76327
MD5 2679d738566b3d1b060cfa22a8a93465
BLAKE2b-256 f28d0e4a8fb0e0ff28e56ebcb1e07fd72f2409456d36f0d4f5442ae76fbff675

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.0-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 004c6fd833d4792c3319bc63e08c61f3d4693855aaddab7f9816f1d929b4eae6
MD5 d44372517da2cd9c528f70c2011c9fb3
BLAKE2b-256 8202700dd36c752939f712d305f38fcac4f9df43e5c22b0132ad52170a4fbe27

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 72af1919f09bab93dda996fa8512d0c389c8cfbac72983106f5b610c45d6a7a2
MD5 1a789a8f829ec3e2e81eefe344a92383
BLAKE2b-256 6040c8fd4f5317224015ad8465860ced3fe4302d879b22da41ffec47a01707b9

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 250690f4e777f4d6e631e3a779c265653a6d052e019b29a478fdd1ca9b41fe40
MD5 073ab847d9d303c90a967b761c1519b4
BLAKE2b-256 76fbb11487eb9d75ae8ca672a54ffb2eefd57a2b689c8f5472ebf4b4336ce84b

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7453a58d347ab4c45ab9ac066fb18b40708f5714999cb386dae740da9844fca0
MD5 9ef33860ead6640a1f476347469afacf
BLAKE2b-256 bbb62f91e57f2761279a10484a307e74045ae53a7f804231bc146a30e9705804

See more details on using hashes here.

File details

Details for the file bundlebase-0.12.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bundlebase-0.12.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 42dca6348fdcdd98893ebe6fb885cb4dcfe0a6d96ff82ea064c152422f5159c9
MD5 76536e492175ac67f6e8a103c3396796
BLAKE2b-256 4fdcf7433a305159e5438ffd3b427059098af64945831488b13f4f018be3e2a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page