Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.8.5-cp38-abi3-win_arm64.whl (14.6 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.8.5-cp38-abi3-win_amd64.whl (16.3 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.8.5-cp38-abi3-musllinux_1_2_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.8.5-cp38-abi3-musllinux_1_2_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.8.5-cp38-abi3-manylinux_2_28_aarch64.whl (14.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.8.5-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.8.5-cp38-abi3-macosx_11_0_arm64.whl (16.1 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.8.5-cp38-abi3-macosx_10_12_x86_64.whl (17.0 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.8.5-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.5-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 52b4573b1b609ff49c443e1c4562d87c4850bc59b4105fdd1c5bbb9783765161
MD5 14a2afb5ef98391e76aeece6e653c47b
BLAKE2b-256 de6bd79e66ba77e798f0612f34e8de9893abfb44d5a0f880320d995bc2a1a086

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.5-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.5-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e9ec247142df5e71a23cd9a0ab905b95e2fabc15047df93a19a588753ab19d28
MD5 88cbcf60eeac0b5d52d87fab4afe7dd8
BLAKE2b-256 b8218a1d6b323e5663be0452c974d22ef7ce787c38c86a46aff24d645e02cf0d

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.5-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.5-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 adac521dbc94659817559703c60687771527c1d200b330117f6a8d33a4660da0
MD5 2084fe9ca2ad2b40704039d5bc1f16f5
BLAKE2b-256 b7a1515754c65c0fa324631fd5903f477a4543e9dee17958a7735b47486f19b5

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.5-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.5-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 5467f5161c7998da7991818255c8fa95eb3547637d50a0a75392785162e18001
MD5 5506efa84e82bab79783985850ff625b
BLAKE2b-256 1aad7e910748fbc59a102f4f72708f5f1ac9a8e8454dfaa47fd8bf13a0e25e35

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.5-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.5-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 636d252ea74115839f39f0213024cce7d9cab29636d38e444a11d28728d0312f
MD5 da7e3ab0d23c13b0cb95bd0b5f31923e
BLAKE2b-256 17d2a9caa8bfeca8d964f83a65a2706b8a245eae553afb8ce75195428bc27b90

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.5-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.5-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5d2b577ef037282abfa21e4c498ebf1c4f680948a450d10e18e5977992677435
MD5 745a95d62d894d86d5315449bb5c00cf
BLAKE2b-256 ad03cafc36a7c70bacef63e267669384e62d60bd4714a3d5df4bdc80688c7d3b

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.5-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.5-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 efc83406036308a8c6959720b9570f89715922f0c0e6975ef9506bf2bfbdd1e2
MD5 521c629f0f8cbe307165f9032bc83467
BLAKE2b-256 cd31d5dc012aa38a60688b70e790374af69b169eceae989c1cf40ec35959fdea

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.5-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.5-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3c58ac7eb9588a7c0b63903baccdd9b09b488e2955f79f377c24179202c6c6bf
MD5 5873e117aa474bc13e0406ca8413f123
BLAKE2b-256 a61c133345b1776980bc9fc8abefc5fa2d515ad6ccfbfc017b12cb46bcd5a528

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page