Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

PyPI version Python 3.8+ License: MIT Documentation Source

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast execution.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age").gt(rs.lit(26)))
print(filtered.collect())
# [{"id": 2, "age": 30, "name": "Bob"}, {"id": 3, "age": 35, "name": "Charlie"}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. See the full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.5.0-cp38-abi3-win_arm64.whl (14.5 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.5.0-cp38-abi3-win_amd64.whl (16.1 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.5.0-cp38-abi3-musllinux_1_2_x86_64.whl (15.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.5.0-cp38-abi3-musllinux_1_2_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.5.0-cp38-abi3-manylinux_2_28_aarch64.whl (14.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.5.0-cp38-abi3-macosx_11_0_arm64.whl (16.0 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.5.0-cp38-abi3-macosx_10_12_x86_64.whl (16.8 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.5.0-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.5.0-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 8d548efda7a7a0f18815c819c1863a44b3605ad30007b6d4046f6cd46abfdc98
MD5 0b8975d182f26003b29bd762c8a2b115
BLAKE2b-256 85d6ffc31d0a5a2865d6349816eecbab0c63221027e3707f15f85cf5d90a3b5d

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.5.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.5.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7830336434e320e348ffff82be78709c17dcf00351c4311eca91ae3e5b8a93e5
MD5 7112a8f607ffa469673266a136852fa5
BLAKE2b-256 31489e9c0221a77bd77c2d99ff8efd9d12b8d5424b98cb51a9d38634ad6c531f

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.5.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.5.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 d7481e8f53bfd8ad76e9a988267171621c6e999e2f84542b5f8bfb497195d168
MD5 f2de654739c93aaa26875c4420bcbd06
BLAKE2b-256 5645fc07761abcf4cf3470ae663041369030548c819153b2d792b91a7e1e2bbc

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.5.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.5.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 9820d4c6bce93d7636570044239eb10860f7d38b20a859866985ac69a5d1a343
MD5 c36836ea6ed73813fe74188c5e7130b5
BLAKE2b-256 b55ab1e01a4a7eeed5de8d1a99b808b0a82d358854c3d20e207b9efc0366392a

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.5.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.5.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2a8d7040f655ee1ed0b0b383d46bc91ae2a935d4542eef7cd02c6777fdc17b0f
MD5 e97984424b0b413d9286bcd7e04c6c65
BLAKE2b-256 9f1bb0760ab6d41dc2bd62d848cf1314042faa4291b6749d3b2cd805a0cf56e6

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c41e865d81684547df1985ea2fce234a015611371c47cc67c0ce18b69d395262
MD5 3d413fa0a53a03bb46e9805278802557
BLAKE2b-256 952731a84a09a06c8d29e1e2036d95866f29c6554a5c979cb8a76c1329eb55b4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.5.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.5.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a48ba1664515875e6b56de9160430a9b2ff23e54e887bdecb2ff8c32ca237a71
MD5 782bd4f75a59cf17e466b6f65af92f4f
BLAKE2b-256 b27ad418b3060c0d1be539efe4859ca27508f7ac06eab400b71d2823a449e475

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.5.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.5.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8fe504cde225ed8424d344caf9ec291d30309d97bf674dbe54fe63eb66939f99
MD5 056231a7f720fce16fd006b03f8416ee
BLAKE2b-256 69803526097c5fc78dfa903afaa7120f012c3cbd742ac615665ceaf1921c70ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page