Skip to main content

PySpark-like DataFrame API in Python—no JVM. Uses robin-sparkless (Rust/Polars) as the execution engine.

Project description

Robin Sparkless

PySpark-style DataFrames in Python—no JVM. Install the sparkless package for a drop-in local PySpark replacement (fast unit tests + CI, no Java/Spark). Under the hood it’s powered by the robin-sparkless Rust engine using Polars for execution.

CI PyPI crates.io docs.rs Documentation License: MIT


Quick start (Python)

# Swap the import—everything else stays the same.
from sparkless.sql import SparkSession, functions as F

spark = SparkSession.builder.app_name("demo").get_or_create()
df = spark.createDataFrame([{"x": 1}, {"x": 2}])
df.filter(F.col("x") > 1).show()

Install from PyPI:

pip install "sparkless>=4,<5"

More Python docs (SQL/temp views, Delta, JDBC, testing plugin): see python/README.md.


Why Sparkless (Python)?

  • Familiar APISparkSession, DataFrame, Column, and PySpark-like functions so you can reuse patterns without the JVM.
  • Fast local execution — Runs natively (no JVM) and uses Polars for IO, expressions, and aggregations.
  • Test the same suite two ways — Use sparkless.testing to run tests with Sparkless (fast) or real PySpark (parity checks).
  • Optional “Spark-like” features — SQL, temp/global temp views, saveAsTable, Delta, and JDBC (see python/README.md).

Features (Python surface)

Area What’s included
Core SparkSession, DataFrame, Column, functions
IO CSV, Parquet, JSON, Delta
Expressions col, lit, when/otherwise, casts, null handling
Aggregates count, sum, avg, min, max, groupBy().agg()
Window row_number, rank, dense_rank, lag, lead, first_value, last_value via .over()
Arrays, strings, JSON Common PySpark functions (explode, regexp_*, get_json_object, from_json, to_json, …)
SQL + views spark.sql, temp/global temp views, saveAsTable, catalog().listTables()
JDBC Read/write via spark.read.jdbc(...) / df.write.jdbc(...)

Parity: 200+ fixtures validated against PySpark. Known differences: docs/PYSPARK_DIFFERENCES.md. Full parity status: docs/PARITY_STATUS.md. Out-of-scope items: docs/DEFERRED_SCOPE.md.


Installation

Python (sparkless v4)

Install from PyPI:

pip install "sparkless>=4,<5"

Or from this repo:

pip install ./python

See python/README.md for usage and development (including maturin develop).

Rust engine (optional)

Most users should use the Python package above. If you want to embed the engine directly in Rust, depend on robin-sparkless.

Add to your Cargo.toml:

[dependencies]
robin-sparkless = "4"

Optional features:

robin-sparkless = { version = "4", features = ["sql"] }      # spark.sql(), temp views
robin-sparkless = { version = "4", features = ["delta"] }    # Delta Lake read/write
robin-sparkless = { version = "4", features = ["jdbc"] }     # PostgreSQL JDBC
robin-sparkless = { version = "4", features = ["sqlite"] }   # SQLite JDBC
robin-sparkless = { version = "4", features = ["jdbc_mysql"] } # MySQL/MariaDB JDBC

Development

If you’re working on the Python package, start in python/README.md (it covers maturin develop, pytest, and dual-mode testing).

If you’re working on the Rust engine crates, see docs/QUICKSTART.md and the crate READMEs in crates/.

This repository contains:

  • python/: the sparkless Python package (v4)
  • Rust crates: the robin-sparkless engine (Cargo workspace)
Command Description
pip install ./python Install the Python package from this repo
cd python && maturin develop Editable install for Python development
pytest tests/ -v Run Python tests (sparkless backend)
SPARKLESS_TEST_MODE=pyspark pytest tests/ -v Run the same tests against real PySpark
scripts/typecheck_strict.sh Strict mypy for the Python sparkless package
make check Rust engine: format, clippy, audit/deny, Rust tests
make test-parity-phases Run PySpark parity fixtures (Rust engine)

CI runs format, clippy, audit, deny, Rust tests, and parity tests on push/PR (see .github/workflows/ci.yml).


Documentation

Resource Description
Python package Sparkless v4 — install from PyPI (pip install sparkless) or pip install ./python, quick start, Sparkless 3 vs 4.x, API overview
Read the Docs Full docs: quickstart, Rust usage, Python getting started, Sparkless integration (MkDocs)
docs.rs Rust API reference
QUICKSTART Build, usage, optional features, benchmarks
User Guide Everyday usage (Rust)
Persistence Guide Global temp views, disk-backed saveAsTable
UDF Guide Scalar, vectorized, and grouped UDFs
Testing Guide Dual-mode testing with sparkless.testing
PySpark Differences Known divergences
Roadmap Development phases, Sparkless integration
RELEASING Publishing to crates.io and PyPI

See CHANGELOG.md for version history.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkless-4.6.0.tar.gz (20.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sparkless-4.6.0-cp38-abi3-win_arm64.whl (33.5 MB view details)

Uploaded CPython 3.8+Windows ARM64

sparkless-4.6.0-cp38-abi3-win_amd64.whl (36.6 MB view details)

Uploaded CPython 3.8+Windows x86-64

sparkless-4.6.0-cp38-abi3-musllinux_1_2_x86_64.whl (34.6 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

sparkless-4.6.0-cp38-abi3-musllinux_1_2_aarch64.whl (32.5 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

sparkless-4.6.0-cp38-abi3-manylinux_2_28_aarch64.whl (32.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

sparkless-4.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

sparkless-4.6.0-cp38-abi3-macosx_11_0_arm64.whl (31.1 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

sparkless-4.6.0-cp38-abi3-macosx_10_12_x86_64.whl (33.1 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file sparkless-4.6.0.tar.gz.

File metadata

  • Download URL: sparkless-4.6.0.tar.gz
  • Upload date:
  • Size: 20.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for sparkless-4.6.0.tar.gz
Algorithm Hash digest
SHA256 9b97c6481b136ec1741ad6ada73eaa4e80c5c35f2b545066246bf497f3ca2907
MD5 38ba3537c221d4c5cfc6d786e476a3af
BLAKE2b-256 48acde968c4a0663274dac72a0f2bd2e18ebb757694eb75d4a665fda3d0b8df2

See more details on using hashes here.

File details

Details for the file sparkless-4.6.0-cp38-abi3-win_arm64.whl.

File metadata

  • Download URL: sparkless-4.6.0-cp38-abi3-win_arm64.whl
  • Upload date:
  • Size: 33.5 MB
  • Tags: CPython 3.8+, Windows ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for sparkless-4.6.0-cp38-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 3a86b22e8ece7991693ee1d835201d8fd8510355ed39c915303893bd6c9d333f
MD5 fdcd843f3fef2aeec6ec84a4bbc18c8b
BLAKE2b-256 1bca9d73dd9f8941bd8eeccde4dd49d5dd5f5eb1dfb317fb4642fbf482f53524

See more details on using hashes here.

File details

Details for the file sparkless-4.6.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: sparkless-4.6.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 36.6 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for sparkless-4.6.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 87999629df7030ad81c65fdc71efc6eb620ed7bd562a851dcac08ea963ad0adc
MD5 da04c9243389ea72ff3aaafacce4f9a2
BLAKE2b-256 199daff6bab58dc597c042b9a7af1a04873d4996b194fa15ce63d43e8848fa99

See more details on using hashes here.

File details

Details for the file sparkless-4.6.0-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for sparkless-4.6.0-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 be4d243a159f21b58bee85c2bb7c8e081024e41182d1502172866fa594fb952e
MD5 5f273a4b15fa85536361ba12f4fdaf1c
BLAKE2b-256 53ab9afa7fcc0c214e5486aa71fa0916dfd6c32cd03f3f2ab5bacd09a17f0c17

See more details on using hashes here.

File details

Details for the file sparkless-4.6.0-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for sparkless-4.6.0-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 788d685795007b7adfc74d63b761c7e91b3605881edc076f6a1ee4f117b6ea68
MD5 eacc3a416a44b3bc1691ad6e937e517e
BLAKE2b-256 2493b3b664273b1fb96b2b1a030e4aa614b2aa81f106126104e4a2bc458d3e55

See more details on using hashes here.

File details

Details for the file sparkless-4.6.0-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for sparkless-4.6.0-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d0633b89098bce0e255599ee50686a4088527b9b925b0b269463d378798c9145
MD5 fafd37fe35a3a504aa4f4817651e1f2b
BLAKE2b-256 7558b0efde7170a961e8dfb16ac5a56f259a8eb4c85d1f76f487fc639d89d6c4

See more details on using hashes here.

File details

Details for the file sparkless-4.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sparkless-4.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0ede1dde7883a63c8422fa905cc560c7df4f3612dc47b6cf598d5a92c6b6d7e8
MD5 a2cd1963ce8f8308ef7caf17915bb8b6
BLAKE2b-256 b00f7f204401415f7036946e3976d3d2a6c05c98d86ea9be07a885426072e412

See more details on using hashes here.

File details

Details for the file sparkless-4.6.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sparkless-4.6.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 927378c7ff84a702682747ca619b9c0b7686e570feeeb9946c82dacd72ac8a30
MD5 1d22bc870bac9f3538823105d78aa4e7
BLAKE2b-256 31b426de4724e9e8025b4cc217f49a9a6091df480392c830f1de7f15532d9f2f

See more details on using hashes here.

File details

Details for the file sparkless-4.6.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for sparkless-4.6.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9068251024b9d71366a5006f45f72083cc02eca37ddeba34229da338c8f9b53f
MD5 6dad2648c92d5be6acf7fe2690590045
BLAKE2b-256 c20732c9ea9ba84f4b8e73957870eff0d16877306d84af9ac4ed26e60e2660b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page