Skip to main content

A compatibility layer for the pyspark API, allowing you to run pyspark code on backends such as DuckDB and Polars without porting your code.

Project description

PySpark Dubber

A compatibility layer for the pyspark API, allowing you to run pyspark code on backends such as DuckDB and Polars without porting your code.

Why

Lately, SQL engines and DataFrame libraries such as DuckDB and Polars have become popular, offering great performance for non-distributed analytical workflows up to relatively large datasets (tens of GBs). For these sizes and below, Spark adds a lot of overhead and its startup time is relatively slow, making it not very cost- and time-efficient.

However, Spark is still the most mature and widely used data processing framework, meaning that many people and organizations have large codebases relying on its APIs.

pyspark-dubber is a library that allows you to run pyspark code on many backends, such as DuckDB and Polars (actually any backend supported by ibis at this time), making it possible to migrate old code to a new backend with minimal changes.

The aspiration of pyspark-dubber is be bug-for-bug compatible with pyspark.

Documentation

You can find API documentation and more information about the project in our documentation page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark_dubber-0.2.4.tar.gz (7.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyspark_dubber-0.2.4-py3-none-any.whl (36.3 kB view details)

Uploaded Python 3

File details

Details for the file pyspark_dubber-0.2.4.tar.gz.

File metadata

  • Download URL: pyspark_dubber-0.2.4.tar.gz
  • Upload date:
  • Size: 7.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyspark_dubber-0.2.4.tar.gz
Algorithm Hash digest
SHA256 c07f2672a9e8263eead08dec77eb4336a8860b755f0adc0fc7713258de2d73cc
MD5 a56654a91ac796a509b261b7c4b29cc4
BLAKE2b-256 773fe8d2d190c3e2479807926292e34ab4edf0b12d4a574d11dd2486f8bb74ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyspark_dubber-0.2.4.tar.gz:

Publisher: tests-and-pypi.yml on frapa/pyspark-dubber

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyspark_dubber-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: pyspark_dubber-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 36.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyspark_dubber-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6dd4ad61c6d69b0432cde73c49b4bf0da26cd74dcd27f647c4e73a35d539e3f7
MD5 46065e1bc5cab949924ae649318788ef
BLAKE2b-256 e2c76fd585cc639ea32de786c951faa8ddea8ba721480aa7d17a3746ec7cbad1

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyspark_dubber-0.2.4-py3-none-any.whl:

Publisher: tests-and-pypi.yml on frapa/pyspark-dubber

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page