Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.7.1-cp38-abi3-win_arm64.whl (14.6 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.7.1-cp38-abi3-win_amd64.whl (16.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.7.1-cp38-abi3-musllinux_1_2_x86_64.whl (15.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.7.1-cp38-abi3-musllinux_1_2_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.7.1-cp38-abi3-manylinux_2_28_aarch64.whl (14.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.7.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.7.1-cp38-abi3-macosx_11_0_arm64.whl (16.0 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.7.1-cp38-abi3-macosx_10_12_x86_64.whl (16.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.7.1-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.1-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 4c1561195e49416ee2967e6bd931725a9048aa38b5fb0e26cfd12ad329bdb39d
MD5 0fe0d8ee403c3e1f40c328422d26ef85
BLAKE2b-256 77a2c1e5fdabb6d0072bd359a9926eb9ba6d74433ec9bb6b16b9c2edd2380b0d

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.1-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 187eb99995149a6b7371d27294d872d40658a27c3c84df11c8f126730a462611
MD5 eb88b0c28a3b9427343be5f401c88fb4
BLAKE2b-256 d83fb8ffae00a17e42d763cb5ed47523b053576cd7e3eb05a9692378e0717156

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.1-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.1-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 f311808bf67ffdaaaad53ccbe55075da6960031d0e210973fb4fc48b07769ff8
MD5 a7963a8104e040bc6e42891be8bc03a9
BLAKE2b-256 a3d88b057a19de8c6343c96bdef2f4e10a8ebf07958cdcad21cdcd58a3c533ab

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.1-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.1-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 db289d5f6732a0477db5fda4f13da9fdc182e542ac2fdf3860030208fed1f5e8
MD5 e5aaf1d2316f7621c0e603047d3f46e4
BLAKE2b-256 869bdfb1fb0cf06168174b81348692c730209c99bbc705897585926b39bbefd4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.1-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.1-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c2df86796af340cf563a6b48db3c2c8afa369a2fca3b714dd0e0ea519d4e583f
MD5 d9d762b56ce409b239242a7560c5772d
BLAKE2b-256 ae9267ff5e096cc2062419ed09c59b7e6e99ca126878fca8a5e914c38238a92a

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9650261de4ac53c0071276639bc1f16937304dcae90ec089180bb9e5e6a91f86
MD5 82547ddc3ca521e5ff594a86e2c5a81e
BLAKE2b-256 07c097a6684b0125f123ac8130b6c2ea5e8d573bdc3832eeb975925a3ee017f6

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d91cfea114e4eb16189bfb73319e7ea0c0d92d2c67f0139f22e0469ad645ff82
MD5 56dea03ca4ba722e3ba67bb57bd4b136
BLAKE2b-256 d4c1a68e4a96a08a69a5773aedc7a1fd9efc8544e928d6241b848bbfa2c216cc

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 bc67599d28237b627f18cac83eaec1499873d1b50d15da8ffc572ff9054479fb
MD5 6a99b4c249f272731ce0f88612280719
BLAKE2b-256 a94364ac1803fdc4050d547d107ac5d6cc17b32400218cc46f4c2ab124cee021

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page