Skip to main content

PySpark-like DataFrame API in Rust (Polars backend), with Python bindings via PyO3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

robin-sparkless (Python)

CI PyPI version Python 3.8+ Documentation License: MIT

PySpark-style DataFrames in Python—no JVM. Uses Polars under the hood for fast, native execution. 200+ operations validated against PySpark.

Install

pip install robin-sparkless

Requirements: Python 3.8+

Quick start

import robin_sparkless as rs

spark = rs.SparkSession.builder().app_name("demo").get_or_create()
df = spark.create_dataframe(
    [(1, 25, "Alice"), (2, 30, "Bob"), (3, 35, "Charlie")],
    ["id", "age", "name"],
)
filtered = df.filter(rs.col("age") > rs.lit(26))  # or .gt(rs.lit(26))
print(filtered.collect())

Output:

[{'id': 2, 'age': 30, 'name': 'Bob'}, {'id': 3, 'age': 35, 'name': 'Charlie'}]

Read from files:

df = spark.read_csv("data.csv")
df = spark.read_parquet("data.parquet")
df = spark.read_json("data.json")

Filter, select, group, join, and use window functions with a PySpark-like API. For arbitrary schemas, use spark._create_dataframe_from_rows(rows, schema). See the User Guide and full documentation for details.

UDFs and pandas_udf (Python)

  • Scalar Python UDFs: spark.udf().register("name", f, return_type=...) and call_udf("name", col("x")), or use the returned UserDefinedFunction directly in with_column / select.
  • Vectorized Python UDFs: spark.udf().register("name", f, return_type=..., vectorized=True) for column-wise batch UDFs (one output per input row) in with_column / select.
  • Grouped vectorized UDFs (GROUPED_AGG): @rs.pandas_udf("double", function_type="grouped_agg") for per-group aggregations in group_by().agg([...]), returning one value per group.

See docs/UDF_GUIDE.md (or the “UDF guide” section in the online docs) for full details, semantics, and limitations.

Optional features (install from source)

Building from source requires Rust and maturin. Clone the repo, then:

pip install maturin
maturin develop --features pyo3           # default: DataFrame API
maturin develop --features "pyo3,sql"      # spark.sql(), temp views, saveAsTable (in-memory tables), catalog.listTables/dropTable, read_delta(name)
maturin develop --features "pyo3,delta"   # read_delta / write_delta (path I/O)
maturin develop --features "pyo3,sql,delta" # all optional features

Type checking

The package ships with PEP 561 type stubs (robin_sparkless.pyi). Use mypy, pyright, or another checker:

pip install robin-sparkless mypy
mypy your_script.py

For Python 3.8 compatibility, use mypy <1.10 (newer mypy drops support for python_version = "3.8" in config). The project’s pyproject.toml includes [tool.mypy] and [tool.ruff] with target-version / python_version set for 3.8.

Development

From a clone of the repo:

# Full CI-like check (Rust + Python lint + Python tests)
make check-full

Or step by step:

python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install maturin pytest
maturin develop --features "pyo3,sql,delta"
pytest tests/python/ -v

Python lint and type-check (run by make check-full):

pip install ruff 'mypy>=1.4,<1.10'
ruff format --check .
ruff check .
mypy .

CI uses the same tooling: ruff, mypy<1.10 (Python 3.8), and pytest. PySpark is not required for tests (parity expectations are predetermined).

Links

Resource URL
Documentation robin-sparkless.readthedocs.io
User Guide docs/USER_GUIDE.md
Python API docs/PYTHON_API.md
UDF Guide docs/UDF_GUIDE.md
Source github.com/eddiethedean/robin-sparkless
Rust crate crates.io/crates/robin-sparkless

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

robin_sparkless-0.9.2-cp38-abi3-win_arm64.whl (24.3 MB view details)

Uploaded CPython 3.8+Windows ARM64

robin_sparkless-0.9.2-cp38-abi3-win_amd64.whl (26.7 MB view details)

Uploaded CPython 3.8+Windows x86-64

robin_sparkless-0.9.2-cp38-abi3-musllinux_1_2_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

robin_sparkless-0.9.2-cp38-abi3-musllinux_1_2_aarch64.whl (23.4 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

robin_sparkless-0.9.2-cp38-abi3-manylinux_2_28_aarch64.whl (23.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

robin_sparkless-0.9.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

robin_sparkless-0.9.2-cp38-abi3-macosx_11_0_arm64.whl (26.9 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

robin_sparkless-0.9.2-cp38-abi3-macosx_10_12_x86_64.whl (28.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file robin_sparkless-0.9.2-cp38-abi3-win_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.2-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 c4675f1926db93e286c2bc8050da96d7e1865467b9eb4cd73c2d46b769c2e744
MD5 3fcd26c34aa4ae56757c6cefb2570be2
BLAKE2b-256 cad5107cd8493de13a8bbe6af201396bfec99bf3f7c1e30a955b4b752e9fa5a0

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.2-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f2becbc594b9d4eb641044ace7a255edca746186f4b200556219bd6fe4ad0120
MD5 70f72e77c45439faea5723649c6b3226
BLAKE2b-256 f5627359ccedc6647ea658c87168d2b3c8f604a8a984682c5d6042908c92f5c5

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.2-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.2-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 080a85bcb95661b240ebddf1dd1e7069fe9d39bde7989e5763589086fd695245
MD5 0a861745501c59366d14ed440b0a5248
BLAKE2b-256 f6ae010c755d0d1947a4588d55677ababbe15bfd85a0464bfcf33a76a75ce7e4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.2-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.2-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 cc9a2abc5a5b0b15ca88cc8cf41a0b549928367a3bafa6502034f3f95f27fad3
MD5 7d69468937008ce63f9388f277a93930
BLAKE2b-256 b43f96c52c5e4e925c94f45b9a6ec7a8eda88a4d2b26650de01ef36b4dc54845

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.2-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.2-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 575e5cc72c8c0aa3040425fad9bd9b4f3eb2be6c40a4d2b9e779bbfe25ff9062
MD5 1296b2df06315a02db1646fe76275754
BLAKE2b-256 5c13c11e291622b6620554cb28d9421896a788a25b6bc5190fa4661fc1e9b5a4

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cd141a333950db2926ecfca64e0f0c3739f0104982a7f7e314db47dfd12cbd12
MD5 511b439d566eaa6e4498d94bbfb02527
BLAKE2b-256 58bd2a372d8deb1a81fb5eeaae5d52acd29ff3b0edfe4fc3e581128de00c5d55

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 38246363031a23238302073ebce417b1a91e911500042ea89ed10b1402183071
MD5 025b843809f7dab8ce2ee0b3255a7b85
BLAKE2b-256 495dfa0c7c084af2502fe1163c17a4aa4089db09e1e458205ceebfbc4e8eaac8

See more details on using hashes here.

File details

Details for the file robin_sparkless-0.9.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for robin_sparkless-0.9.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f471232bc10189f6c4f0bebabe4b5ebeb7d5ac98071dc45fc0056d715079fc57
MD5 d7287937cf9edf9565ff667b2ed08254
BLAKE2b-256 be3e443856fe2ad71ebc5f5152ec8e04c963dd7db4401910181fa5322c7162c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page