Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.9.1-cp38-abi3-win_arm64.whl (24.3 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.9.1-cp38-abi3-win_amd64.whl (26.8 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.9.1-cp38-abi3-musllinux_1_2_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.9.1-cp38-abi3-musllinux_1_2_aarch64.whl (23.4 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.9.1-cp38-abi3-manylinux_2_28_aarch64.whl (23.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.9.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.9.1-cp38-abi3-macosx_11_0_arm64.whl (26.9 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.9.1-cp38-abi3-macosx_10_12_x86_64.whl (28.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.9.1-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.1-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 5be5a6ad295d72b6e1a20ec876534a44ab9e4fb133d624af7cad3d8e7257cdc1
MD5 91bc576aebd17b8bd229ff7e25939e2e
BLAKE2b-256 8a7a9b4d8e9f7f122805d904d132607e0e1f794c080e87376d8192495ed9d125

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.1-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 dded860b39026032b8fc02932dc688f65e5df7a171ca3fab4fed1780d27bf686
MD5 052aa919df552b76fbececb19f222345
BLAKE2b-256 112e65b782045835b9a26d4e4d71750b87b636ec7431b4251febad0cc61dc462

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.1-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.1-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 1f1afaa8f192de614801393d1550eff475e558b4224cda2c401d03f44c8e39d4
MD5 96c3b29eeea4e78e2acd0eaa141b753d
BLAKE2b-256 1197baf604f4dadf2b34c2fe55c94ae8d788f0c266b054cef7647a85f61b0b3f

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.1-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.1-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 f07a8f9542e3884c226bec8773d72e92b0824f472d2b8967e78218a6c4d1c0ee
MD5 621ad70410a989ee3867e923a73c4313
BLAKE2b-256 9f52987eaf8a0456b15cd559bface0e938a7111de4c907f8c66eb07b8281f603

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.1-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.1-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f1c1dac3a912c78e34ed86139a5b6024ea0b11db1a577be76bc95d390e119fee
MD5 e53b7f8962009cdda507ff781d029541
BLAKE2b-256 ca9ca905cb824f066c89612b429fc11797f7d1ee076a4d2cc79959a4318ec8a4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6d4bfa371920e6f7c9b51ff42fe17ed2ddca6e00375f04c7857f9e453f6f7ed6
MD5 735239720a5247eed4ec29f69a49b4ef
BLAKE2b-256 2a211c4b3821dce0a797f90d3a4b5703d986e3adad3155151af1300b744017b4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6fa88297a916d2f6f26324e65bf2edb95be44b19c2ccfc27e1d3a54d66dee445
MD5 39999cfd7c248e3a04415fc5c49e52af
BLAKE2b-256 03af87b7ce3b0d7d40b4b7c5b86a0df2e195131d8989f754ef539f9e1180284b

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4349f64ea7e69f119247ed7d6c2020a90661cab38c11bfcf8bf41347c27685e4
MD5 6f0a3f8e3537096d005e22b737408255
BLAKE2b-256 3d80344c448b5506a0bb7ffcc4c09c277d9fb6f884e48536bb9b27497f736d27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page