Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. Lazy by default: transformations extend the plan; only actions (collect, show, count, write) trigger execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.createDataFrame(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. Use spark.createDataFrame(data, schema=None) for list of dicts (schema inferred), list of tuples with column names, DDL string (including nested struct<>, array<>, map<>), or explicit schema as list of (name, dtype_str). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

# Run all examples (Rust + Python doc examples with real output)
make run-examples

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.11.0-cp38-abi3-win_arm64.whl (24.8 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.11.0-cp38-abi3-win_amd64.whl (27.3 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.11.0-cp38-abi3-musllinux_1_2_x86_64.whl (26.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.11.0-cp38-abi3-musllinux_1_2_aarch64.whl (24.0 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.11.0-cp38-abi3-manylinux_2_28_aarch64.whl (24.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.11.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.11.0-cp38-abi3-macosx_11_0_arm64.whl (27.4 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.11.0-cp38-abi3-macosx_10_12_x86_64.whl (28.8 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.11.0-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.11.0-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 514d6f5a9fb68d2ca191272ec36a2842ee981b5b4dd9ff81421e886aa7f28b68
MD5 5089170584dbc2ca3386bd25d0049965
BLAKE2b-256 c227473b6c6839cc70ecb8e988de331a258131e5c2def3039a077e2c7013f493

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.11.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.11.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f749875a0373064a62ccb0be5f4a44d11c31ed237aaf47b400cf04d724267853
MD5 159c3bc4fd77cfeed5ab5907760d6232
BLAKE2b-256 461a47afdfe65f452b34d4636e343f2b410bde049767cb4b61e5b642a3890719

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.11.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.11.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 93f3ce40be9c51d2f929b30f52c52a6ec0ddf2b01a7a7bd0bfe727517cefdbaf
MD5 ff57fcd38eed31e7eae7fe687d5f7058
BLAKE2b-256 2e604d501dd6b0e00fe564e9f8fe77ebea8c4ee442e3712a5be9efcade7bbaf1

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.11.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.11.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 f5c2fd7577626e4d045df7dfc5dcc6792d68c441719fb5f9f13736bbf03641bd
MD5 13dbc7fcebd7d8b89da0fb667087ff81
BLAKE2b-256 b38811e7e3049d30cea52ee2a4eed62a8eb22efc60e4c62368602ff4ade584c8

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.11.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.11.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0f12c2ae75f581e055689eeca772b3adcaf08aa0d1c58a87ec4752612382746d
MD5 d66843567c273bd6891634525fb998cd
BLAKE2b-256 9763cdd2c8317963b52eac99f4b8680389fec893887929a0557ae599659129e9

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.11.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.11.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f91b76d54cbfb3a6be141f4b6c8c242fcfb07d0113f3acb5c892eeadbc5848f7
MD5 a5826a11a0558166f88efd075a6da749
BLAKE2b-256 9d5b29f380aebc5818d270457483aff366aed2388d2ab67681826b8e84811c03

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.11.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.11.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5590d3df8a0c79a5c8a3a9ff9af01f327752927edcfd3822430bf071a002911e
MD5 1955e3514e0896e4895abb0c71b8071b
BLAKE2b-256 af955e6dd14dbddac49ce3eb54bef8e35af8feecc06d18a2ff73272c02e83084

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.11.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.11.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 84bc827caefb6edb488a8d7e424302ffd00109696f71e947d7fa0eb155e6ddff
MD5 b6e10c488fdce16bf18013ca016872e7
BLAKE2b-256 690b9b93b4e22b73fe4f90b722580d9eb531cdf2e20c33dc6deff6e59f2c7177

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page