Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.6.0-cp38-abi3-win_arm64.whl (14.6 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.6.0-cp38-abi3-win_amd64.whl (16.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.6.0-cp38-abi3-musllinux_1_2_x86_64.whl (15.1 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.6.0-cp38-abi3-musllinux_1_2_aarch64.whl (13.8 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.6.0-cp38-abi3-manylinux_2_28_aarch64.whl (14.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.6.0-cp38-abi3-macosx_11_0_arm64.whl (16.0 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.6.0-cp38-abi3-macosx_10_12_x86_64.whl (16.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.6.0-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.6.0-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 8562a0c05d1c25dc72e042a9cbeacb2b81ebb565b71cf496977c2c550c52641a
MD5 9c17c77fd2d1bfceca64bed37a99b58e
BLAKE2b-256 b585de6d8c13a136ec7e63115bc665106caa3d892bf7109fa81661c3a4706a0a

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.6.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.6.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b73140927dfe42e79b08489d0cef6b4a8fbfae48c63a64505f40fac54c4a7dfb
MD5 f1cbd479b54d626117eeee332e946dc4
BLAKE2b-256 e9906ebf9b540fce2cc620d2d20dfbea8e73d5ed12db107dd48de38ca6988f7a

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.6.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.6.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 a428e9916b830abfdc7086d968758dbb2c7d66943b80b9bc6b4bcff720705850
MD5 1eb29bed89400291e55bf0bfdb947484
BLAKE2b-256 0deee4e37ee9c332d8a5d28f364036aea6579814295787cdbdb65306bad52642

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.6.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.6.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 99c95b1e9bf6a8cf436f016f647636adecf3a3f1210e12bc9edc9692ddc397a1
MD5 d36934487ddb199b4b7c8abbdc024985
BLAKE2b-256 bcc75bbb7c5a9a6bd5db2f42a6631b33d9d8175a3c547b414dfa7ecddaf4a755

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.6.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.6.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8c11d38aa70cbfaf7cf67ffa3c08133d83fb1b62d56c6070141f34e309b40b9f
MD5 18184f4465d67a62ff281fb915507065
BLAKE2b-256 3e8fdfe97c7eb13b5bc13f5a56e8dab0ca4de1ff9f9cac17f1203ccc0e8eb4f4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5c662433759177c8962d2f26464441ea3e7e69e2367d7141311810f081c17bc4
MD5 f620e03f3394b1e3d46f5dee27ad1f36
BLAKE2b-256 9e0e4c7c697ba38dffff81f79aceb51dd8d4d8332994046fce799bfb02fe02ad

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.6.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.6.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6af940ba6b564fe6584d5d0c464ffd9131ad955362ea4975bd551f5839fd3105
MD5 5df28be8e51e91bdea8c7eea58969383
BLAKE2b-256 ae49b3f68ea9561fc35e8aa71e0445d27aca928fd1fef3268f8a48fbcf4456a9

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.6.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.6.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 142713b024971b404ee3d4d8918ca81673a3cbe3dfe74c5fda32dda2b99b71d5
MD5 c47a88517eee52eb3b8d511dd23413f5
BLAKE2b-256 47c2a283fad672c23d19ad2dfdf94610f2791f11af1348d2c4bb6fa194225f91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page