Skip to main content

PySpark-like DataFrame API in Python—no JVM. Uses robin-sparkless (Rust/Polars) as the execution engine.

Project description

Robin Sparkless

PySpark-style DataFrames in Python—no JVM. Install the sparkless package for a drop-in local PySpark replacement (fast unit tests + CI, no Java/Spark). Under the hood it’s powered by the robin-sparkless Rust engine using Polars for execution.

CI PyPI crates.io docs.rs Documentation License: MIT


Quick start (Python)

# Swap the import—everything else stays the same.
from sparkless.sql import SparkSession, functions as F

spark = SparkSession.builder.app_name("demo").get_or_create()
df = spark.createDataFrame([{"x": 1}, {"x": 2}])
df.filter(F.col("x") > 1).show()

Install from PyPI:

pip install "sparkless>=4,<5"

More Python docs (SQL/temp views, Delta, JDBC, testing plugin): see python/README.md.


Why Sparkless (Python)?

  • Familiar APISparkSession, DataFrame, Column, and PySpark-like functions so you can reuse patterns without the JVM.
  • Fast local execution — Runs natively (no JVM) and uses Polars for IO, expressions, and aggregations.
  • Test the same suite two ways — Use sparkless.testing to run tests with Sparkless (fast) or real PySpark (parity checks).
  • Optional “Spark-like” features — SQL, temp/global temp views, saveAsTable, Delta, and JDBC (see python/README.md).

Features (Python surface)

Area What’s included
Core SparkSession, DataFrame, Column, functions
IO CSV, Parquet, JSON, Delta
Expressions col, lit, when/otherwise, casts, null handling
Aggregates count, sum, avg, min, max, groupBy().agg()
Window row_number, rank, dense_rank, lag, lead, first_value, last_value via .over()
Arrays, strings, JSON Common PySpark functions (explode, regexp_*, get_json_object, from_json, to_json, …)
SQL + views spark.sql, temp/global temp views, saveAsTable, catalog().listTables()
JDBC Read/write via spark.read.jdbc(...) / df.write.jdbc(...)

Parity: 200+ fixtures validated against PySpark. Known differences: docs/PYSPARK_DIFFERENCES.md. Full parity status: docs/PARITY_STATUS.md. Out-of-scope items: docs/DEFERRED_SCOPE.md.


Installation

Python (sparkless v4)

Install from PyPI:

pip install "sparkless>=4,<5"

Or from this repo:

pip install ./python

See python/README.md for usage and development (including maturin develop).

Rust engine (optional)

Most users should use the Python package above. If you want to embed the engine directly in Rust, depend on robin-sparkless.

Add to your Cargo.toml:

[dependencies]
robin-sparkless = "4"

Optional features:

robin-sparkless = { version = "4", features = ["sql"] }      # spark.sql(), temp views
robin-sparkless = { version = "4", features = ["delta"] }    # Delta Lake read/write
robin-sparkless = { version = "4", features = ["jdbc"] }     # PostgreSQL JDBC
robin-sparkless = { version = "4", features = ["sqlite"] }   # SQLite JDBC
robin-sparkless = { version = "4", features = ["jdbc_mysql"] } # MySQL/MariaDB JDBC

Development

If you’re working on the Python package, start in python/README.md (it covers maturin develop, pytest, and dual-mode testing).

If you’re working on the Rust engine crates, see docs/QUICKSTART.md and the crate READMEs in crates/.

This repository contains:

  • python/: the sparkless Python package (v4)
  • Rust crates: the robin-sparkless engine (Cargo workspace)
Command Description
pip install ./python Install the Python package from this repo
cd python && maturin develop Editable install for Python development
pytest tests/ -v Run Python tests (sparkless backend)
SPARKLESS_TEST_MODE=pyspark pytest tests/ -v Run the same tests against real PySpark
scripts/typecheck_strict.sh Strict mypy for the Python sparkless package
make check Rust engine: format, clippy, audit/deny, Rust tests
make test-parity-phases Run PySpark parity fixtures (Rust engine)

CI runs format, clippy, audit, deny, Rust tests, and parity tests on push/PR (see .github/workflows/ci.yml).


Documentation

Resource Description
Python package Sparkless v4 — install from PyPI (pip install sparkless) or pip install ./python, quick start, Sparkless 3 vs 4.x, API overview
Read the Docs Full docs: quickstart, Rust usage, Python getting started, Sparkless integration (MkDocs)
docs.rs Rust API reference
QUICKSTART Build, usage, optional features, benchmarks
User Guide Everyday usage (Rust)
Persistence Guide Global temp views, disk-backed saveAsTable
UDF Guide Scalar, vectorized, and grouped UDFs
Testing Guide Dual-mode testing with sparkless.testing
PySpark Differences Known divergences
Roadmap Development phases, Sparkless integration
RELEASING Publishing to crates.io and PyPI

See CHANGELOG.md for version history.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkless-4.5.7.tar.gz (20.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sparkless-4.5.7-cp38-abi3-win_arm64.whl (33.6 MB view details)

Uploaded CPython 3.8+Windows ARM64

sparkless-4.5.7-cp38-abi3-win_amd64.whl (36.6 MB view details)

Uploaded CPython 3.8+Windows x86-64

sparkless-4.5.7-cp38-abi3-musllinux_1_2_x86_64.whl (34.6 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

sparkless-4.5.7-cp38-abi3-musllinux_1_2_aarch64.whl (32.6 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

sparkless-4.5.7-cp38-abi3-manylinux_2_28_aarch64.whl (32.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

sparkless-4.5.7-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

sparkless-4.5.7-cp38-abi3-macosx_11_0_arm64.whl (31.1 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

sparkless-4.5.7-cp38-abi3-macosx_10_12_x86_64.whl (33.1 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file sparkless-4.5.7.tar.gz.

File metadata

  • Download URL: sparkless-4.5.7.tar.gz
  • Upload date:
  • Size: 20.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sparkless-4.5.7.tar.gz
Algorithm Hash digest
SHA256 b55590d13480a6e4dfcc87367edc6e9248463c90ff00f17aba19098ec8aa913b
MD5 94a26e83f210a9c4e422e9670861843b
BLAKE2b-256 cc4cd103e46088bcd7375794d6ed0566036b6155fa84fc4d6989f55993133267

See more details on using hashes here.

File details

Details for the file sparkless-4.5.7-cp38-abi3-win_arm64.whl.

File metadata

  • Download URL: sparkless-4.5.7-cp38-abi3-win_arm64.whl
  • Upload date:
  • Size: 33.6 MB
  • Tags: CPython 3.8+, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sparkless-4.5.7-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 0c30b3d8639800bb92c67ebef268ff133a9c6ed5f133a39dbc8cf0f35422b7a8
MD5 fc26fcd3ae8641e795be3a80ef7f5977
BLAKE2b-256 9020b0831a7a9acfde48761b204ab778bc6a2d99be5cecc1b62deb6a33be33d7

See more details on using hashes here.

File details

Details for the file sparkless-4.5.7-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: sparkless-4.5.7-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 36.6 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for sparkless-4.5.7-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 3a8345b85fbe1ed6dc9f9c72d56c0b287bf6c764538767a7c04356383eb7668e
MD5 c337deaef2bb39c36a8d4cfbafc178ce
BLAKE2b-256 8264a785303d89137ec4b01f73d06560effcc93d3593c2940011fb155a9aa73c

See more details on using hashes here.

File details

Details for the file sparkless-4.5.7-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for sparkless-4.5.7-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 b372797279022ba5c213c9a69e8d0719fd20e5e5365607f6f7c15e0845d23f04
MD5 fde868f474777963b0c9bf04631a4487
BLAKE2b-256 cac3205dbae2728919a8058eaabe407b519a99586d776daaed113873affe2982

See more details on using hashes here.

File details

Details for the file sparkless-4.5.7-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for sparkless-4.5.7-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 ead9ab2c80b8409147b293766c19a8766993a758d20a32945289b60cea747a37
MD5 87b8d16aa52d713e31cdcf15ed8aa3b1
BLAKE2b-256 f8eceb8b1e90928840e91b15157d2c96dd5a291cf36cd70e2140174a5b2168e2

See more details on using hashes here.

File details

Details for the file sparkless-4.5.7-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for sparkless-4.5.7-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9db745339521d1cdf83dcef0ab1dbdc5ce170f517e850481f87e758fa1f6e9b9
MD5 6b63ab94622288a5d85b0f591b018518
BLAKE2b-256 39f993b75fa1ee5bd028669985c46ceff745599b8d869cbad154a940886378d7

See more details on using hashes here.

File details

Details for the file sparkless-4.5.7-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sparkless-4.5.7-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 56f0cd64b21edae11dc521c99144c912dc88169f1592241d8b5f86a53d68332e
MD5 9c36274df453d3ac8bdd01137c2515e3
BLAKE2b-256 34f3775ea324b79c88860d3c1ca13e61c4336ee39af85dcff5c2fe599123cfc9

See more details on using hashes here.

File details

Details for the file sparkless-4.5.7-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sparkless-4.5.7-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c422862cc6b02919ed278ba467dfd83825a504654d6b08cd764ed11584835923
MD5 97e3ddaa306f42740a95487df93a6f72
BLAKE2b-256 4863b4b2330fcba0336f6970bbb94b3c9964fde6a7e43ae0fed33a64914b0438

See more details on using hashes here.

File details

Details for the file sparkless-4.5.7-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for sparkless-4.5.7-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0cd13a6e89fa5caf6f0f0bd737920fed70f342d53ea8afd7a5e2710aac719edc
MD5 e6511b61f27cc3c45df8a26905d39f52
BLAKE2b-256 b7ac78a1d784325079e716a7ca2101f1dfecbbe3a3d7fa5d0b7d89e6ef8b2462

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page