Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.9.0-cp38-abi3-win_arm64.whl (24.3 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.9.0-cp38-abi3-win_amd64.whl (26.8 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.9.0-cp38-abi3-musllinux_1_2_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.9.0-cp38-abi3-musllinux_1_2_aarch64.whl (23.4 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.9.0-cp38-abi3-manylinux_2_28_aarch64.whl (23.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.9.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.9.0-cp38-abi3-macosx_11_0_arm64.whl (26.9 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.9.0-cp38-abi3-macosx_10_12_x86_64.whl (28.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.9.0-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.0-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 3e7a510571c5350131114fe980f202df5c2807fafe5759610907ef1236791049
MD5 e3ad4d90d0b779d61d526c74d0509bfe
BLAKE2b-256 d06d1063126f6a64f510e823aa24dd4352d922ebea6bd17882c4d807fd8aa667

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b41e0d919499ca27c620c0c316778049064900f0c7aeee0a759f1fc1d0d60608
MD5 b23fe0e9097910aec17c038ece0df976
BLAKE2b-256 fda2db1a00b7aaa711160a99b0fe48e946a2ef6a226cbe0b832933aa76bda0ba

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 0311b173daca9e7c295bf9bacdad50cbc419b67afca96fcd2d3ab76e2db2ffc4
MD5 22bd59f8e9943836e232f395dc56276f
BLAKE2b-256 cbfead63477d25fdb89791aad3ab74b0bb380c62ba742abeddd3c3597d668e14

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 7d98a578eb516aa191734124c37a8753b130e381c61d949b3160afb715964efb
MD5 35f363e5a578bb6ed2d37eecd9f3e14e
BLAKE2b-256 ead2b0bbe768ff0bf4baf3682f28be1bb007b846cc5f03ef5e3a94b6f3073f4d

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 adacf5c295792fa1a6be8335e49bfce3ff17227e625a85088f7a6962f2a4fcc4
MD5 b3b9eefd6a1e6d2027740b5beb288489
BLAKE2b-256 40fbec4d9f2de8ec8150288b150fa4e85326ca934c699213848ddaefab39046c

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dc9cf3caeed8aacf9568ca8cb436f5fda5b990646d7564afecd28001ac72fe76
MD5 653c3d35773f74baccec7988000186d7
BLAKE2b-256 99b99d7d71d884aecdf64beb1eee6bfa551260219e0ee4b2c37c08dfa0d1af9b

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 034a655de81a0fcf904add55493ab774207f560f23e699841f9ce865ce3e9b7a
MD5 e7b965ce0da8da2a2ee1df0fa57853d6
BLAKE2b-256 377f0028924832f5a0160e104a1104846e99747c013305eab72c85dc461046c7

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 19db82eb836ba933566379854581b238550836e2be61db4e4ccb2f3b6ef50328
MD5 8305e9c2663a5da97f722befd6bc5143
BLAKE2b-256 851dca1a14720f6b9c874b366da8a7315fa3c52f94f8526fb596367ac2157086

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page