Skip to main content

A compatibility layer for the pyspark API, allowing you to run pyspark code on backends such as DuckDB and Polars without porting your code.

Project description

PySpark Dubber

A compatibility layer for the pyspark API, allowing you to run pyspark code on backends such as DuckDB and Polars without porting your code.

Why

Lately, SQL engines and DataFrame libraries such as DuckDB and Polars have become popular, offering great performance for non-distributed analytical workflows up to relatively large datasets (tens of GBs). For these sizes and below, Spark adds a lot of overhead and its startup time is relatively slow, making it not very cost- and time-efficient.

However, Spark is still the most mature and widely used data processing framework, meaning that many people and organizations have large codebases relying on its APIs.

pyspark-dubber is a library that allows you to run pyspark code on many backends, such as DuckDB and Polars (actually any backend supported by ibis at this time), making it possible to migrate old code to a new backend with minimal changes.

The aspiration of pyspark-dubber is be bug-for-bug compatible with pyspark.

Documentation

You can find API documentation and more information about the project in our documentation page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_dubber-0.2.3.tar.gz (7.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyspark_dubber-0.2.3-py3-none-any.whl (31.4 kB view details)

Uploaded Python 3

File details

Details for the file pyspark_dubber-0.2.3.tar.gz.

File metadata

  • Download URL: pyspark_dubber-0.2.3.tar.gz
  • Upload date:
  • Size: 7.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyspark_dubber-0.2.3.tar.gz
Algorithm Hash digest
SHA256 68d052208bc413fb6c269a46b74de973fe8203cf8bbe4a8f51ae0532c4296956
MD5 c073164f54620cd6806577099091b79b
BLAKE2b-256 a4a7bd0419b342c240e5105d281c51eb1442a3c654ecf97bf2a3f5780af6a4b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyspark_dubber-0.2.3.tar.gz:

Publisher: tests-and-pypi.yml on frapa/pyspark-dubber

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyspark_dubber-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: pyspark_dubber-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 31.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyspark_dubber-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f3c70c8f7243351b2fda14dad931945914828c86ac8f397a99e68eadd2fcdfa3
MD5 299f3c8988fc4d34ca6e4cc56736961f
BLAKE2b-256 b7af4376087b49f1bedc6426ecda1d71f4f8bac49024948386503979488306b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyspark_dubber-0.2.3-py3-none-any.whl:

Publisher: tests-and-pypi.yml on frapa/pyspark-dubber

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page