Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.8.3-cp38-abi3-win_arm64.whl (14.6 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.8.3-cp38-abi3-win_amd64.whl (16.3 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.8.3-cp38-abi3-musllinux_1_2_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.8.3-cp38-abi3-musllinux_1_2_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.8.3-cp38-abi3-manylinux_2_28_aarch64.whl (14.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.8.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.8.3-cp38-abi3-macosx_11_0_arm64.whl (16.1 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.8.3-cp38-abi3-macosx_10_12_x86_64.whl (16.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.8.3-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.3-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 67b4008c7d1852bfdb78eb98b1e6a5f228bb7f8d952f05a586a7ec58fb35d66e
MD5 b036f2bf065cef762bc4420e07df5a7f
BLAKE2b-256 292e26be179a29505ca6197b2581667b7fa4bda82a7829ce8eb2a55ba4956e14

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.3-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 88014e2aa979d8b426624b351d2e4755d36588dc51b332a38aa71bb70b6a8faf
MD5 d908419b4ec5497a599844b4df4ba72d
BLAKE2b-256 3c05ef80e381b29077e608e5fd0427ccc7c6c5f0226fb20daae62d1c9d63e6d7

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.3-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.3-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 e5e715b96ffea35d985a143e93a9f2a1496ef39ecbb628791c2279e95a92570d
MD5 0a700feaab81c5a443f7d393e52c6e2d
BLAKE2b-256 389c6dcf742df24ba5793741f6198d90374cf9c7c15838d378b8dc8fe23897d4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.3-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.3-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 6823dc628a0b187fc5e55783ae86970386637ba9e71281f549ec5bb2ab0d8b70
MD5 5eb27a125e7639e96b92370881051b97
BLAKE2b-256 d2bc8e2b9e0fbe163887d3d8c675dffad4254ba01a54821ba3db3e0dfd255bb4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.3-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.3-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 92da7fbba13facb15fb8e1692438c0a2e94d3dc7e31320772e3450be551ae65b
MD5 dadc14b16bae535e31316439dde89086
BLAKE2b-256 e307a0a1d414e0a8f7f6f73a62f060be3aeb05c7b3c4a74c64b768884c07122e

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1903e5b8274ba881bcd724f4892aca560d7a6ba3bca29c5d6e8fea0fcd30411a
MD5 f8b1097fd34870f2c809c40cbd109f38
BLAKE2b-256 766fbedf286eb72cc8731db533b0f1f5ad8b1021297e9c2094684402b54ae12e

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cb8a29efa9706a57cdc1e192134f47277a3497e489af36d403614edd64493979
MD5 7abe60748f9acbb41393da95a0385b15
BLAKE2b-256 bed7d8f4e1f80f7a8b8edf8b73ce3f0effb4b107f6bd26d0b2db5683cb1bf2f9

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.8.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.8.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cdfebca35f2417d56733c32725226d641266333af5e13d9a1a44ade54cff0ea9
MD5 0096f6354d9a7775e4483b12537968b9
BLAKE2b-256 ebd8af9679a4b62f74307409038784c29851189a0b2c9f3f84088b135a3de916

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page