Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.8.1-cp38-abi3-win_arm64.whl (14.6 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.8.1-cp38-abi3-win_amd64.whl (16.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.8.1-cp38-abi3-musllinux_1_2_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.8.1-cp38-abi3-musllinux_1_2_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.8.1-cp38-abi3-manylinux_2_28_aarch64.whl (14.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.8.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.8.1-cp38-abi3-macosx_11_0_arm64.whl (16.1 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.8.1-cp38-abi3-macosx_10_12_x86_64.whl (16.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.8.1-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.1-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 fc942671a41a183ce056168a0c9a648b5b41319dc001aa3aed5b3041bdb943d6
MD5 6dc0f3872cf994e60f755a5a3a58ef98
BLAKE2b-256 c6dc495c4ee3c30c5172ded7127417ec4691f47e02bcbdf13e1f4e3faf6c96de

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.1-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8fccf5decbf05986d8222e6b0f7bd4b2046a83ba24e470fa1b1a0af6f30dd36e
MD5 71c98c98ae527dd3c429464512208f9e
BLAKE2b-256 c2bae78f838d6ae1021c6b320f8ae0886df3a95b17508db161b5ef38d0df8e65

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.1-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.1-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 1b1f6e0ed638bc51f9747b8994054e02f4317084a1411acc6aef98a1de39a958
MD5 16ab9736d3ac18dcb9ca59e8df063ede
BLAKE2b-256 fcda0ac269406ad989bdd7022baffd115bdb36f36371f21e37fac8aa582fb78d

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.1-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.1-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 d371e1063bd57f08dc1212c969eb6b94fa4f68bb7bb627902f9f0f02a8fd189a
MD5 2eb7d0d5f8ac44cf04c9ab77eceb1548
BLAKE2b-256 0df9cd869d3d480fee6c9d14f5810af8f42fa2ce6c9b087fe304f1e677c6ce81

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.1-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.1-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b0d7cb5d2819d7ab71d7c6aefab666273efbc9ec5f69188162f8a2b003d2f756
MD5 4bc6650f06cd96114a9cf845bfc4ef60
BLAKE2b-256 b4d278d2c150db7a0ef4dd44dca827137fdc43244910bff7e3cf7f9a46faeea3

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 29df79705778b8aad33cce4f7a63b2ba7e951e1fc9c505d84b707734a3a23b3f
MD5 a352b7cff8524635cadced688a58075d
BLAKE2b-256 d66fc38f1da6a0645eda1590341b91ff68f3e27e4eac7de7fc8bd89a2a8b4545

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e6654bdeb1107a501f39fcf4d5df5fbb62c9f9b4655f036e2ce6db72cce59a87
MD5 a3e4a03b050ec494e2f754d514e74713
BLAKE2b-256 4b0a4171217110bc3bc1b7914523717bd914a0c7fa400b062dec3230a1e038e9

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a2998cecf03622718b1b4236be96453d46b4c02d8a22d48e1da003d1bd79ed0d
MD5 74c29fb2c9ec695ffbdc2c9f7a50ffe9
BLAKE2b-256 4df8774470a303f54dbe47b78b04e1929c03d6a3069896728d139774d8de083e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page