Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.8.0-cp38-abi3-win_arm64.whl (14.6 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.8.0-cp38-abi3-win_amd64.whl (16.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.8.0-cp38-abi3-musllinux_1_2_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.8.0-cp38-abi3-musllinux_1_2_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.8.0-cp38-abi3-manylinux_2_28_aarch64.whl (14.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.8.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.8.0-cp38-abi3-macosx_11_0_arm64.whl (16.1 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.8.0-cp38-abi3-macosx_10_12_x86_64.whl (16.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.8.0-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.0-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 03b2af913426191554126b3ec0a1af1cb723ec796b48c83bec561606de0ff160
MD5 840928cccb89c3237b856a63b018bf46
BLAKE2b-256 ccabe20eb029baefe519b4f24cd103cfab5eddce97b2f637426c794a3065f613

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c9948557c7a2c9309294d202a2f6367d100b1351d24bb648f3f1cd5fe2f627f2
MD5 4baa1a42453260aa888bb73c4b8eb8b7
BLAKE2b-256 49e1e6b835022a8b79109477fd01c343b26ee98304695439306447ece7d1def9

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 5942de36a8fdf09228e4762c6b5f1467d3c286d3c837e79388c22c8b64bb9e28
MD5 06f46f083ab60a61dffe856985457fd4
BLAKE2b-256 a44093852a4226a640a0aecb698a14b132f4115073bf6e2944eac09a2a909ff1

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 2e65dfb8676f1c5879ccbc5caaa5e2ff4d98ce0a842d3f7b57a4fdd5ffa7e51c
MD5 a37ac3f56da8f94e03ba9683e85fd038
BLAKE2b-256 5ca259a0b8fb11c96ab3eab44285a48744010310acb0a90d01bc97e0d1f49dca

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 38ff877cbc9afce937dcf8c7f31ee3ecc47a4c0cbb97c4b56afcccc759e62ddb
MD5 95fbbb324b4777ae3c60660b59a730f5
BLAKE2b-256 65305a93b67feb37e85ba4bb469323a496d5dd709ac86550d2c21186ac240f60

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1417105fb0c3d84794da5bb1aab7972b2b9eec20d123373e60e8c7f28f7c651c
MD5 2240bfdfbae01d71c202e9830319c12e
BLAKE2b-256 6e5dfeb94ad3ff9673569eb857158a457569faf7d2826af23a9b1912686e2ce4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 43934b5258c30eade5b5450c1fecde7519d5a749a5b33f6926ab87b600500816
MD5 0ac48ed1de65f61140495df13e14e9a9
BLAKE2b-256 9207a44b5f59f2bcb3c26f0abc0666269d4edecd31862126c7441cc875004630

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c038fc727f3aa7506a0e7c5793e5f586c69107bbfb52302ca5cf411032440703
MD5 7bf2b41ec550375103a1c6b309796412
BLAKE2b-256 dda36248e31be1dcdd836fd05a5b7236c071e27bbb6d57a2fc484aa946198719

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page