Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Robin Sparkless

PySpark-style DataFrames in Rust—no JVM. A DataFrame library that mirrors PySpark’s API and semantics while using Polars as the execution engine.

crates.io docs.rs Documentation License: MIT


Why Robin Sparkless?

  • Familiar APISparkSession, DataFrame, Column, and PySpark-like functions so you can reuse patterns without the JVM.
  • Polars under the hood — Fast, native Rust execution with Polars for IO, expressions, and aggregations.
  • Rust-first, Python optional — Use it as a Rust library or build the Python extension via PyO3 for a drop-in style API.
  • Sparkless backend target — Designed to power Sparkless (the Python PySpark replacement) so Sparkless can run on this engine via PyO3.

Features

Area What’s included
Core SparkSession, DataFrame, Column; filter, select, with_column, order_by, group_by, joins
IO CSV, Parquet, JSON via SparkSession::read_*
Expressions col(), lit(), when/then/otherwise, coalesce, cast, type/conditional helpers
Aggregates count, sum, avg, min, max, and more; multi-column groupBy
Window row_number, rank, dense_rank, lag, lead, first_value, last_value, and others with .over()
Arrays & maps array_*, explode, create_map, map_keys, map_values, and related functions
Strings & JSON String functions (upper, lower, substring, regexp_*, etc.), get_json_object, from_json, to_json
Datetime & math Date/time extractors and arithmetic, year/month/day, math (sin, cos, sqrt, pow, …)
Optional SQL spark.sql("SELECT ...") with temp views (createOrReplaceTempView, table) — enable with --features sql
Optional Delta read_delta, read_delta_with_version, write_delta — enable with --features delta

Known differences from PySpark are documented in docs/PYSPARK_DIFFERENCES.md. Parity status and roadmap are in docs/PARITY_STATUS.md and docs/ROADMAP.md.


Installation

Rust

Add to your Cargo.toml:

[dependencies]
robin-sparkless = "0.1.0"

Optional features:

robin-sparkless = { version = "0.1.0", features = ["sql"] }   # spark.sql(), temp views
robin-sparkless = { version = "0.1.0", features = ["delta"] }  # Delta Lake read/write

Python (PyO3)

Build the Python extension with maturin (Rust + Python 3.8+):

pip install maturin
maturin develop --features pyo3
# With optional SQL and/or Delta:
maturin develop --features "pyo3,sql"
maturin develop --features "pyo3,delta"
maturin develop --features "pyo3,sql,delta"

Then use the robin_sparkless module; see docs/PYTHON_API.md.


Quick start

Rust

use robin_sparkless::{col, lit_i64, SparkSession};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let spark = SparkSession::builder().app_name("demo").get_or_create();

    // Create a DataFrame from rows (id, age, name)
    let df = spark.create_dataframe(
        vec![
            (1, 25, "Alice".to_string()),
            (2, 30, "Bob".to_string()),
            (3, 35, "Charlie".to_string()),
        ],
        vec!["id", "age", "name"],
    )?;

    // Filter and show
    let adults = df.filter(col("age").gt(lit_i64(26)))?;
    adults.show(Some(10))?;

    Ok(())
}

You can also wrap an existing Polars DataFrame with DataFrame::from_polars(polars_df). See docs/QUICKSTART.md for joins, window functions, and more.

Python

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe([(1, 25, "Alice"), (2, 30, "Bob")], ["id", "age", "name"])
filtered = df.filter(rs.col("age").gt(rs.lit(26)))
print(filtered.collect())  # [{"id": 2, "age": 30, "name": "Bob"}]

Development

Prerequisites: Rust (see rust-toolchain.toml), and for Python tests: Python 3.8+, maturin, pytest.

Command Description
cargo build Build (Rust only)
cargo build --features pyo3 Build with Python extension
cargo test Run Rust tests
make test Run Rust + Python tests (creates venv, maturin develop, pytest)
make check Format, clippy, audit, deny, tests
cargo bench Benchmarks (robin-sparkless vs Polars)
cargo doc --open Build and open API docs

CI runs the same checks on push/PR (see .github/workflows/ci.yml).


Documentation

See also CHANGELOG.md for version history.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robin_sparkless-0.1.0.tar.gz (365.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.1.0-cp38-abi3-win_arm64.whl (14.2 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.1.0-cp38-abi3-win_amd64.whl (15.8 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.1.0-cp38-abi3-musllinux_1_2_x86_64.whl (18.4 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.1.0-cp38-abi3-musllinux_1_2_aarch64.whl (16.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.1.0-cp38-abi3-manylinux_2_28_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.1.0-cp38-abi3-macosx_11_0_arm64.whl (13.3 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl (14.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.1.0.tar.gz.

File metadata

  • Download URL: robin_sparkless-0.1.0.tar.gz
  • Upload date:
  • Size: 365.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for robin_sparkless-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cf93903c9eb8b9a1a884c6e9c3a8aa7594283ea819b4304c6ca5924f9e7a9419
MD5 4381da59d7364faf8b802425cef549d4
BLAKE2b-256 315aa9da912366aec7e92a52ec32fb7343ea669a696e9e5fc7e5920a8f1aee8f

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.1.0-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.1.0-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 036031245f545df746fe0c67e5daa15550e1a24b01251fc152e0e502264a3df2
MD5 18435a0bd9a9036905dbdabdc162f596
BLAKE2b-256 07e01a7fe75ab18d826333e9d345cc43e1b52ac9177819e141f8a271a57edc24

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.1.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.1.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1f7b5a0f05ef0eeb15942fc5f37129f77f5ba9fb9cc97a6291a22f023f67a90d
MD5 bfdb240f8610030cfec8fcbdc7a9da26
BLAKE2b-256 7efd8d15c1d82ce15a24c470133098d917a088fb5b7bd5b1dbe6be49bb8ac2ae

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.1.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.1.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 8bd7af41d94893cbe3352f24afbd1bd120dc60cabc82c8b240ff49a96f591c09
MD5 af6e52c2e1492f990ca0e85430fd2e28
BLAKE2b-256 c22cee81941e19099ea7b272870afd506a5e13f74c7180f4681010b3b1bb257e

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.1.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.1.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 94bc2ecc718c61bef99b89222632b4f02d5ee8201e714d96a87f04d8a1389f09
MD5 536de72e9afeda0abe257e6d0dcb2893
BLAKE2b-256 212bae282636259ec99b11fdc2fbdc04cf3a371ab76d87f150015ae0a549e554

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.1.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.1.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3eeca6ce74692e95d1916ee9e6411b82088d1444c26b380ea0634f00a07f232e
MD5 74bc79518f27c686e87ff917255263a2
BLAKE2b-256 28129d2469239298932509e4bd5b764b2e784679583c6e2fcbefa6bbfa24f0f8

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.1.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 559db2f5d16b81c4fd9660b94cac57b0e369dedcd44f77f771fc038be5f4b0dd
MD5 c13f4875445bd60f7e787f2756b7bc52
BLAKE2b-256 c2db10bd37f1c0a305108ce14685f5eb03a95bd47bc170d8c58d96b5801d9ed3

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.1.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 865ed7418c2b61e052db1ad329d37c45642684bed6c238a61b77794838456f73
MD5 184701aa493a69bfb4a0eb9e1fe49a4b
BLAKE2b-256 d6ca11fcb75abccc90e0efa0cbd4424526b9a0d0bafa14ebd0ffa5609381dda2

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.1.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 132f680a2ea258c70eb75e9622a1fe4094ace8a21a21373e8712bdc5b9f475cc
MD5 2c19fc1d60e2f9bba9bd59ac73b91c45
BLAKE2b-256 3df0d301bf16ca7358b788f0b71ec397ab0dfe2f6eb77686508f9ff06761f933

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page