Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.8.4-cp38-abi3-win_arm64.whl (14.6 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.8.4-cp38-abi3-win_amd64.whl (16.3 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.8.4-cp38-abi3-musllinux_1_2_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.8.4-cp38-abi3-musllinux_1_2_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.8.4-cp38-abi3-manylinux_2_28_aarch64.whl (14.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.8.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.8.4-cp38-abi3-macosx_11_0_arm64.whl (16.1 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.8.4-cp38-abi3-macosx_10_12_x86_64.whl (16.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.8.4-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.4-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 e2db8e088cbea9a06bc8ca744d426544dcc0bd137f034e433eec514fcafebb52
MD5 a62496c6f4a598bc87085acb950994f4
BLAKE2b-256 65e714d5dcc90eadf6128fda5bebd9f574d6c2196c349a84c66941c03bca0ac4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.4-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.4-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7c1b1bdc8e4ea3e104811dba1e35bbc04b61b5d5a340aca75fcc1b5eaa35482e
MD5 32782bfacd62e46a47814c16dd8040f4
BLAKE2b-256 6bf2c045c568383426dfcb6a55400074438878d4cb09c6881f145d7266997757

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.4-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.4-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 e9cee7ab78825e7b589293ea8e7bebba91b3fb16de0b0b073eee5914017d7f1f
MD5 136c88fae2ebd852996d0fd0fd8ede6b
BLAKE2b-256 59dac8187249e6f7955718ef4cf610df33198b07b9a945835a017d96cbe8871a

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.4-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.4-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 b00efba0a0f9b61d7ad32f7b2751f28d5ef5eff15e08a4a7ef2b465c64d893cd
MD5 29b28352aa519147d53480e10c0a4bf5
BLAKE2b-256 cf6dc69a0a96fc69f589b6425f9c2224ece3c2997104c4daefff05df2dfaf75e

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.4-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.4-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 4c93360cfe51689851969109bb4d8a140aeecb06ba1468cc3b5c649238512a05
MD5 48469eaaa123d4cb83d03b258d232b5d
BLAKE2b-256 6daeb168a6b54b0372a26117457889ec67f926c13bef8bdf18dd2c8c00b30a2a

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7a63345a8a2a15bc0945980baa08ef3f297b6fac520c65968566c7511dbfb7e8
MD5 d74e041eb56ac8acdac6978171bc5953
BLAKE2b-256 e81827733d019b58732edad60a406c801e47d2a513c77fbed894588f9ec2fcca

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.4-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.4-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 666b528a2f762c1be93a627ffc6083edbc58e1c4603c6040821652b200d83511
MD5 2e323f59859a00cad8552df71aa4a9a6
BLAKE2b-256 3011f0c03dc73ca80eaa8077db60b3a39e4f0a35efaf1b04c9e9aa8f03c1de2e

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.4-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.4-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3a2275fad6fee21af9253071ce37fc1757f61ae9fe58cf132705c1132f9955b1
MD5 32408f703cc59b60af10020cd47599ca
BLAKE2b-256 2641c829ea49865a94328594f71f8ef0ae50e521235994eed1939c45b5bd6ed4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page