Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. Lazy by default: transformations extend the plan; only actions (collect, show, count, write) trigger execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.createDataFrame(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. Use spark.createDataFrame(data, schema=None) for list of dicts (schema inferred), list of tuples with column names, or explicit schema as list of (name, dtype_str). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.10.0-cp38-abi3-win_arm64.whl (24.8 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.10.0-cp38-abi3-win_amd64.whl (27.3 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.10.0-cp38-abi3-musllinux_1_2_x86_64.whl (26.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.10.0-cp38-abi3-musllinux_1_2_aarch64.whl (23.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.10.0-cp38-abi3-manylinux_2_28_aarch64.whl (24.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.10.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.10.0-cp38-abi3-macosx_11_0_arm64.whl (27.4 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.10.0-cp38-abi3-macosx_10_12_x86_64.whl (28.8 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.10.0-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.10.0-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 da561076f37f269c0b452c7853ed3a58d5e0634a1abfa7ecd726039a0d3030d0
MD5 cc117472724598287cc83deaa53aceb8
BLAKE2b-256 5c3db0b882237c6a5eea266f446a24f0fb1620ef88089620a9c2528b986ff475

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.10.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.10.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 82862bef80eabb0096928e541ab6a4340dbd2602544639e20c7ef463d16d03a3
MD5 9cd7cbefda3d4229a7ca7a40b41c7420
BLAKE2b-256 88a9a96754016ef9bd5f9b17c1dda096cbb17c4fab88300942cb265c633e1ef9

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.10.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.10.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 84a9624b6d1ad638db3dbf8a6a4d0f317fac3e9b9555522e6dc5f4c2183ed24a
MD5 7b086457862d8ceed2aa0254241d60dc
BLAKE2b-256 b1acba432142b24786471420f9f84f6919423fd12dba380b1a0b68feff13c055

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.10.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.10.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 c9573f3aa84f268e27f095451f6a78a8ede94106788e470ea161d5b64a2f9004
MD5 d84951ff6517d0b86277a6143f87fbc4
BLAKE2b-256 353c07d155947094f8a34519880fa3c33ea828dfdfe5205576636e2f5ff66ff9

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.10.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.10.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2f442f11bace011eee2e160ffd78123a9c1c1ac0c6fa5ae98de9a950d84705e6
MD5 33ad24eaddd36b8cb7f2d91f05d384f6
BLAKE2b-256 3dd1aa8975da6b77cf84d45dc77058d9f6e2bf568d93e5e853fd37f490bd3b1d

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.10.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.10.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2e7da7eb484025820308bfa6e058f5c1233686dde866eba225ec634b6f9dd0b7
MD5 22c625cd2e78336a5d6cf81344020fd4
BLAKE2b-256 865896c4ac85ba66555487abe5ac692ac2eb76b23e13a7aef78af973a3d5ba7b

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.10.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.10.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b896c79b45cc878ee0551c922c0baa424f5ee5b24ab72a233ab28c3de7f3ae2b
MD5 c85cb47b44436813dac15e69804ff185
BLAKE2b-256 888067e2da9a14531a36f7f2356533990efb79ffa2d07feb6cd6794acd4b8d71

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.10.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.10.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7c87380204bfc02849e6d8f4e66d961c3a7c5892fca66d47dc34b381f53977a8
MD5 4ed5bab1f3ab702bc8dd1f3586e2fe8b
BLAKE2b-256 f7b2429010b9b6a87062a5de6cc6d9ce1bfae7e971a32d620acdad905b704588

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page