Skip to main content

PySpark-like DataFrame API in Python—no JVM. Uses robin-sparkless (Rust/Polars) as the execution engine.

Project description

Robin Sparkless

PySpark-style DataFrames in Python—no JVM. Install the sparkless package for a drop-in local PySpark replacement (fast unit tests + CI, no Java/Spark). Under the hood it’s powered by the robin-sparkless Rust engine using Polars for execution.

CI PyPI crates.io docs.rs Documentation License: MIT


Quick start (Python)

# Swap the import—everything else stays the same.
from sparkless.sql import SparkSession, functions as F

spark = SparkSession.builder.app_name("demo").get_or_create()
df = spark.createDataFrame([{"x": 1}, {"x": 2}])
df.filter(F.col("x") > 1).show()

Install from PyPI:

pip install "sparkless>=4,<5"

More Python docs (SQL/temp views, Delta, JDBC, testing plugin): see python/README.md.


Why Sparkless (Python)?

  • Familiar APISparkSession, DataFrame, Column, and PySpark-like functions so you can reuse patterns without the JVM.
  • Fast local execution — Runs natively (no JVM) and uses Polars for IO, expressions, and aggregations.
  • Test the same suite two ways — Use sparkless.testing to run tests with Sparkless (fast) or real PySpark (parity checks).
  • Optional “Spark-like” features — SQL, temp/global temp views, saveAsTable, Delta, and JDBC (see python/README.md).

Features (Python surface)

Area What’s included
Core SparkSession, DataFrame, Column, functions
IO CSV, Parquet, JSON, Delta
Expressions col, lit, when/otherwise, casts, null handling
Aggregates count, sum, avg, min, max, groupBy().agg()
Window row_number, rank, dense_rank, lag, lead, first_value, last_value via .over()
Arrays, strings, JSON Common PySpark functions (explode, regexp_*, get_json_object, from_json, to_json, …)
SQL + views spark.sql, temp/global temp views, saveAsTable, catalog().listTables()
JDBC Read/write via spark.read.jdbc(...) / df.write.jdbc(...)

Parity: 200+ fixtures validated against PySpark. Known differences: docs/PYSPARK_DIFFERENCES.md. Full parity status: docs/PARITY_STATUS.md. Out-of-scope items: docs/DEFERRED_SCOPE.md.


Installation

Python (sparkless v4)

Install from PyPI:

pip install "sparkless>=4,<5"

Or from this repo:

pip install ./python

See python/README.md for usage and development (including maturin develop).

Rust engine (optional)

Most users should use the Python package above. If you want to embed the engine directly in Rust, depend on robin-sparkless.

Add to your Cargo.toml:

[dependencies]
robin-sparkless = "4"

Optional features:

robin-sparkless = { version = "4", features = ["sql"] }      # spark.sql(), temp views
robin-sparkless = { version = "4", features = ["delta"] }    # Delta Lake read/write
robin-sparkless = { version = "4", features = ["jdbc"] }     # PostgreSQL JDBC
robin-sparkless = { version = "4", features = ["sqlite"] }   # SQLite JDBC
robin-sparkless = { version = "4", features = ["jdbc_mysql"] } # MySQL/MariaDB JDBC

Development

If you’re working on the Python package, start in python/README.md (it covers maturin develop, pytest, and dual-mode testing).

If you’re working on the Rust engine crates, see docs/QUICKSTART.md and the crate READMEs in crates/.

This repository contains:

  • python/: the sparkless Python package (v4)
  • Rust crates: the robin-sparkless engine (Cargo workspace)
Command Description
pip install ./python Install the Python package from this repo
cd python && maturin develop Editable install for Python development
pytest tests/ -v Run Python tests (sparkless backend)
SPARKLESS_TEST_MODE=pyspark pytest tests/ -v Run the same tests against real PySpark
scripts/typecheck_strict.sh Strict mypy for the Python sparkless package
make check Rust engine: format, clippy, audit/deny, Rust tests
make test-parity-phases Run PySpark parity fixtures (Rust engine)

CI runs format, clippy, audit, deny, Rust tests, and parity tests on push/PR (see .github/workflows/ci.yml).


Documentation

Resource Description
Python package Sparkless v4 — install from PyPI (pip install sparkless) or pip install ./python, quick start, Sparkless 3 vs 4.x, API overview
Read the Docs Full docs: quickstart, Rust usage, Python getting started, Sparkless integration (MkDocs)
docs.rs Rust API reference
QUICKSTART Build, usage, optional features, benchmarks
User Guide Everyday usage (Rust)
Persistence Guide Global temp views, disk-backed saveAsTable
UDF Guide Scalar, vectorized, and grouped UDFs
Testing Guide Dual-mode testing with sparkless.testing
PySpark Differences Known divergences
Roadmap Development phases, Sparkless integration
RELEASING Publishing to crates.io and PyPI

See CHANGELOG.md for version history.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkless-4.7.0.tar.gz (20.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sparkless-4.7.0-cp38-abi3-win_arm64.whl (33.7 MB view details)

Uploaded CPython 3.8+Windows ARM64

sparkless-4.7.0-cp38-abi3-win_amd64.whl (36.7 MB view details)

Uploaded CPython 3.8+Windows x86-64

sparkless-4.7.0-cp38-abi3-musllinux_1_2_x86_64.whl (34.8 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

sparkless-4.7.0-cp38-abi3-musllinux_1_2_aarch64.whl (32.6 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

sparkless-4.7.0-cp38-abi3-manylinux_2_28_aarch64.whl (32.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

sparkless-4.7.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

sparkless-4.7.0-cp38-abi3-macosx_11_0_arm64.whl (31.2 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

sparkless-4.7.0-cp38-abi3-macosx_10_12_x86_64.whl (33.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file sparkless-4.7.0.tar.gz.

File metadata

  • Download URL: sparkless-4.7.0.tar.gz
  • Upload date:
  • Size: 20.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.3

File hashes

Hashes for sparkless-4.7.0.tar.gz
Algorithm Hash digest
SHA256 21f8476c08ce4e51968088e34c38ee9af53d57e4b2896fcbb405a2b0063a29af
MD5 15cce1fceae5f2a9b48e1ef685e828c8
BLAKE2b-256 b61b3654103da268b028e18e2b37ccf6bee88952c8491f57b0a7127222e6a30e

See more details on using hashes here.

File details

Details for the file sparkless-4.7.0-cp38-abi3-win_arm64.whl.

File metadata

  • Download URL: sparkless-4.7.0-cp38-abi3-win_arm64.whl
  • Upload date:
  • Size: 33.7 MB
  • Tags: CPython 3.8+, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.3

File hashes

Hashes for sparkless-4.7.0-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 5cc7d81a93e23cb18417e74f532db2668229dc8c4e774d230b11344d894013e2
MD5 c3c3ecd5d27ab2cfef4fc58a1325eee4
BLAKE2b-256 184c487dd481232a56dce6640af371089b83534f7d66a38ab88452e6edc4e1e9

See more details on using hashes here.

File details

Details for the file sparkless-4.7.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: sparkless-4.7.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 36.7 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.3

File hashes

Hashes for sparkless-4.7.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ca01650a9a6aec53c66864b9b2a1b94f2884ef37c4424678b4abc879c07cfbec
MD5 8f3f918ed479e2054598d0b3a72210d8
BLAKE2b-256 c0cc2051fc5cea5b22971121ff6533216349cc44a6a041d88d79b8e7994f261c

See more details on using hashes here.

File details

Details for the file sparkless-4.7.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for sparkless-4.7.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 4fa3128fce3b1e78462827afb0daefd48dc6ccd2417a78d3c4cc9f2d20f6ae49
MD5 2ed4598086d3c840aec9dc8129d7acda
BLAKE2b-256 779fa103d17969d8ce2738edd5ff0df2faea0d40d07b67945803f8530b7e49c6

See more details on using hashes here.

File details

Details for the file sparkless-4.7.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for sparkless-4.7.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 c48abec1c04d64a4c1a77aa07bdf8f0474ca84590308070a5acd6c1533b02e82
MD5 e009c5a753ac55dae66de475f2a5ee65
BLAKE2b-256 3f1ee031a81ead0887c3560f8166ef3976ba850ae310195617490737890a9ef3

See more details on using hashes here.

File details

Details for the file sparkless-4.7.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for sparkless-4.7.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 4e5453033051aedc1c04f67d5cf7b7ec74ced11f43d4fa7d421362987b60ee03
MD5 3255b5ba7f63829cab0ca66ab9a7e73a
BLAKE2b-256 f4f5143247008d1da44a3638d688ad2d914bb37f4d985fab2472a558f861040f

See more details on using hashes here.

File details

Details for the file sparkless-4.7.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sparkless-4.7.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d436ab05ccb4bef0345cb9fba47b968636079c5fc6ee07d36d65ce5258b96e41
MD5 895943aa05781fab3d72b2f68ef7f4fa
BLAKE2b-256 d0dd9495161ea0c9ffae2b6d247e3e55b1f921aa730b4367c4c1ce3b992490f5

See more details on using hashes here.

File details

Details for the file sparkless-4.7.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sparkless-4.7.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8b88bb8c254ccb606404f4bc8b702e377fe2691bf0add0f3244fefde3b8af9b4
MD5 d9edb3f46965c5d5e732081af71e2144
BLAKE2b-256 085d2400b462f922cfd66e1787b6a43b88286dfc80d9a2fc6c3c67e6d7e3357e

See more details on using hashes here.

File details

Details for the file sparkless-4.7.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for sparkless-4.7.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 20c74cb13d8fd5bb8738f65d25519f0fdfa40ac12abaeb3f1c9166de202f56ce
MD5 578ba639013993002a3c2607e2c3f894
BLAKE2b-256 e548232319a71ab79346f46458e96e178e870169d3be6f6c41a2146345eb7888

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page