Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.8.2-cp38-abi3-win_arm64.whl (14.6 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.8.2-cp38-abi3-win_amd64.whl (16.3 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.8.2-cp38-abi3-musllinux_1_2_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.8.2-cp38-abi3-musllinux_1_2_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.8.2-cp38-abi3-manylinux_2_28_aarch64.whl (14.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.8.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.8.2-cp38-abi3-macosx_11_0_arm64.whl (16.1 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.8.2-cp38-abi3-macosx_10_12_x86_64.whl (16.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.8.2-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.2-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 dee8048231c6b7f3bfa5a639a40de1b46c65deac45e2dab5b083e346e3eca98f
MD5 c70ca6144d68dd30a1b49d4fd2835d6d
BLAKE2b-256 096d30df4c56880b7c5a7844ccf936606acc181de1e8e691aff40a413eb92031

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.2-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c88880a176a1622e220548b3a560d1c61dd3250ce3de78b774a3699c62a41c5e
MD5 8a1a15f6a3e1de66f8b02d00d3e0e34b
BLAKE2b-256 c3069a30d9df9f7c504e1eaf5e422ef3787de609663d3ffb0af5ec05f6cdf94e

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.2-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.2-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 234b285dc6aa2151890a9e2a68e06a052e765693c4e2ffdd013a07cb31655c2f
MD5 c609e01b168db50d2150eba73768e943
BLAKE2b-256 3a43a91f64033de0545f76f78b022c0d408832d873a4c5086774501e59c96a02

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.2-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.2-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 7bf562e83e4a104d9002ee52e964cd8c6a0c58f245788e466937d028b659d5e8
MD5 aa5d46eedc8f80e24dce765c4b36f920
BLAKE2b-256 13484e18ada7121324a4b8340b2794c94bc34069be776768abee3415f106acce

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.2-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.2-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c3a227406db361dac438ba70a38a49385cf3bf689f0f0344ce8f46fe5f187aa3
MD5 57c7407cfc537ea1a948257829d7b2b9
BLAKE2b-256 7bc90e08d7b82d96ab2f5f9ad0acea3f8cff8cd5343ddf32aec992dcfe857a0d

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3c2bf464c2ff7fd558cb3d6a818ddbc5073973c3e9be18a2b702e826774661db
MD5 8dd2e271f3f7c1a6481a8b4b74f367ad
BLAKE2b-256 4748c2f978ae1915f3ff00bc46224a5033c911f6bab5a6cefd736768115c6fe4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 90a93182509807efd39ed9089e10a34e79b51ad2cd058d099ecebc064fcb2cfb
MD5 44f08db72e66efb777d6286cf485f480
BLAKE2b-256 2de1ba3c9cfe43eb4858e2fd39aeebd7599bf9a17294aed8306dc4f31bc5bcf2

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b64225767b52975aae517656e013c2ccad7382ff0ad944db7ea67c81b31e26bb
MD5 c75e39b1de6bf832f76560bebaf01abb
BLAKE2b-256 2a41263016c99dd20ca21213407454ca6d747f2df32ebe954f0d1d736afd2679

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page