Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.7.0-cp38-abi3-win_arm64.whl (14.6 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.7.0-cp38-abi3-win_amd64.whl (16.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.7.0-cp38-abi3-musllinux_1_2_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.7.0-cp38-abi3-musllinux_1_2_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.7.0-cp38-abi3-manylinux_2_28_aarch64.whl (14.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.7.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.7.0-cp38-abi3-macosx_11_0_arm64.whl (16.0 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.7.0-cp38-abi3-macosx_10_12_x86_64.whl (16.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.7.0-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.0-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 a51fe535cbd17f9f0be0a3e1268769c774be213a0e8a942a64a9d87649ec2205
MD5 70f55333ea92062ddadef9521e85620f
BLAKE2b-256 c9b545466b467262cd5d4af5e0dd15584fdf6f8dcf50d3b291fb73aad7e08b5f

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1e840c6e6de76bce4b4e582a336008288987345fc19c3b82571d80c07161e6cc
MD5 a9aadd0f165692607b517a497645a4a7
BLAKE2b-256 c2eca810fc38a4cf1338690c4c96a7271155d0cbfed39f9a36bfa102336fbc72

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 7ef2c6192891be262df3aad44fc0af92ce4dff24180567bd91a7307b690723b9
MD5 e37f0f7f31de0ce9bb5e5fe9bbebefbc
BLAKE2b-256 cdec6ee0101c89f179906d641a385e5ca2f228320141c77ff554ad010ff12c28

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 ba336eab06f2e88867a1bfdfb037d9036822dd9a5ce56b4b65cc23cccbef33b0
MD5 b8ba3fa3d01c20c55db2a20a7775c77f
BLAKE2b-256 806bee4a7baa7e2543658541ca63de5f54c943b614d6c1922a1a22393d43d387

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e29a3bbe8d6e267323a74ef1b302058bc90868dae0cc38a5bddb82e7156b6799
MD5 04fdd91b6b5792aa62d7b2812a905c56
BLAKE2b-256 39c637c1558ef09bbec7f613435a35b2f852314946a0df57c7f08eff76caf7c7

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dc19776aacaa8bbb069c3a880e7a403fb4e64bad51896a9ee62e02a73c8d0aa8
MD5 40941fab64cfb1468ce9c68d49325ba2
BLAKE2b-256 39723b0c7c43222970e3f72e3ef6c70abe1e228f254158fe6d1b765ec505430c

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7a510c8d6bb8700e3c2192191299f29772550abdc5df5caa19392f5b1ffb8e2f
MD5 1117ed534ffde928ed7c49ec168ba410
BLAKE2b-256 973677bb5aa4ec9d853f80e918296b41be3a09fb27be08967f3441bf3fabeb11

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.7.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.7.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d3d0cd22f1cd7ff4bc96b2e1a6f01e784b85f1bdc9f992ecc4f7f9793ac7fabe
MD5 a1a00e586dcf7c9f8a370314e353e9d5
BLAKE2b-256 bfc2af3034ef0a6f7710cde6348a557ec0774e04c59613218198a00f913ba1e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page