sqlframe

Turning PySpark Into a Universal DataFrame API

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

eakmanrq

These details have not been verified by PyPI

Project description

SQLFrame implements the PySpark DataFrame API in order to enable running transformation pipelines directly on database engines - no Spark clusters or dependencies required.

SQLFrame currently supports the following engines (many more in development):

There are also two engines in development. These engines lack test coverage and robust documentation, but are available for early testing:

SQLFrame also has a "Standalone" session that be used to generate SQL without any connection to a database engine.

Standalone

SQLFrame is great for:

Users who want a DataFrame API that leverages the full power of their engine to do the processing
Users who want to run PySpark code quickly locally without the overhead of starting a Spark session
Users who want a SQL representation of their DataFrame code for debugging or sharing with others
Users who want to run PySpark DataFrame code without the complexity of using Spark for processing

Installation

# BigQuery
pip install "sqlframe[bigquery]"
# DuckDB
pip install "sqlframe[duckdb]"
# Postgres
pip install "sqlframe[postgres]"
# Snowflake
pip install "sqlframe[snowflake]"
# Spark
pip install "sqlframe[spark]"
# Redshift (in development)
pip install "sqlframe[redshift]"
# Databricks (in development)
pip install "sqlframe[databricks]"
# Standalone
pip install sqlframe

See specific engine documentation for additional setup instructions.

Configuration

SQLFrame generates consistently accurate yet complex SQL for engine execution. However, when using df.sql(optimize=True), it produces more human-readable SQL. For details on how to configure this output and leverage OpenAI to enhance the SQL, see Generated SQL Configuration.

SQLFrame by default uses the Spark dialect for input and output. This can be changed to make SQLFrame feel more like a native DataFrame API for the engine you are using. See Input and Output Dialect Configuration.

Activating SQLFrame

SQLFrame can either replace pyspark imports or be used alongside them. To replace pyspark imports, use the activate function to set the engine to use.

from sqlframe import activate

# Activate SQLFrame to run directly on DuckDB
activate(engine="duckdb")

from pyspark.sql import SparkSession
session = SparkSession.builder.getOrCreate()

SQLFrame can also be directly imported which both maintains pyspark imports but also allows for a more engine-native DataFrame API:

from sqlframe.duckdb import DuckDBSession

session = DuckDBSession.builder.getOrCreate()

Example Usage

from sqlframe import activate

# Activate SQLFrame to run directly on BigQuery
activate(engine="bigquery")

from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql import Window

session = SparkSession.builder.getOrCreate()
table_path = '"bigquery-public-data".samples.natality'
# Top 5 years with the greatest year-over-year % change in new families with single child
df = (
  session.table(table_path)
  .where(F.col("ever_born") == 1)
  .groupBy("year")
  .agg(F.count("*").alias("num_single_child_families"))
  .withColumn(
    "last_year_num_single_child_families",
    F.lag(F.col("num_single_child_families"), 1).over(Window.orderBy("year"))
  )
  .withColumn(
    "percent_change",
    (F.col("num_single_child_families") - F.col("last_year_num_single_child_families"))
    / F.col("last_year_num_single_child_families")
  )
  .orderBy(F.abs(F.col("percent_change")).desc())
  .select(
    F.col("year").alias("year"),
    F.format_number("num_single_child_families", 0).alias("new families single child"),
    F.format_number(F.col("percent_change") * 100, 2).alias("percent change"),
  )
  .limit(5)
)

>>> df.sql(optimize=True)
WITH `t94228` AS (
  SELECT
    `natality`.`year` AS `year`,
    COUNT(*) AS `num_single_child_families`
  FROM `bigquery-public-data`.`samples`.`natality` AS `natality`
  WHERE
    `natality`.`ever_born` = 1
  GROUP BY
    `natality`.`year`
), `t39093` AS (
  SELECT
    `t94228`.`year` AS `year`,
    `t94228`.`num_single_child_families` AS `num_single_child_families`,
    LAG(`t94228`.`num_single_child_families`, 1) OVER (ORDER BY `t94228`.`year`) AS `last_year_num_single_child_families`
  FROM `t94228` AS `t94228`
)
SELECT
  `t39093`.`year` AS `year`,
  FORMAT('%\'.0f', ROUND(CAST(`t39093`.`num_single_child_families` AS FLOAT64), 0)) AS `new families single child`,
  FORMAT('%\'.2f', ROUND(CAST((((`t39093`.`num_single_child_families` - `t39093`.`last_year_num_single_child_families`) / `t39093`.`last_year_num_single_child_families`) * 100) AS FLOAT64), 2)) AS `percent change`
FROM `t39093` AS `t39093`
ORDER BY
  ABS(`percent_change`) DESC
LIMIT 5

>>> df.show()
+------+---------------------------+----------------+
| year | new families single child | percent change |
+------+---------------------------+----------------+
| 1989 |         1,650,246         |     25.02      |
| 1974 |          783,448          |     14.49      |
| 1977 |         1,057,379         |     11.38      |
| 1985 |         1,308,476         |     11.15      |
| 1975 |          868,985          |     10.92      |
+------+---------------------------+----------------+

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

eakmanrq

These details have not been verified by PyPI

Release history Release notifications | RSS feed

4.2.0

May 20, 2026

4.1.0

Mar 18, 2026

4.0.0

Mar 17, 2026

3.49.0

Mar 15, 2026

3.48.0

Mar 7, 2026

3.47.0

Feb 28, 2026

3.46.2

Feb 4, 2026

3.46.1

Feb 2, 2026

3.46.0

Feb 1, 2026

3.45.0

Jan 21, 2026

3.44.1

Jan 10, 2026

3.44.0

Jan 1, 2026

3.43.8

Nov 1, 2025

3.43.7

Oct 24, 2025

3.43.6

Oct 12, 2025

3.43.5

Oct 10, 2025

3.43.4

Oct 8, 2025

3.43.3

Sep 30, 2025

3.43.2

Sep 24, 2025

3.43.1

Sep 19, 2025

3.43.0

Sep 18, 2025

3.42.0

Sep 16, 2025

3.41.0

Sep 15, 2025

3.40.2

Sep 10, 2025

3.40.1

Sep 8, 2025

3.40.0

Sep 7, 2025

3.39.4

Aug 28, 2025

3.39.3

Aug 27, 2025

3.39.2

Aug 21, 2025

3.39.1

Aug 16, 2025

3.39.0

Aug 16, 2025

3.38.2

Aug 7, 2025

3.38.1

Aug 7, 2025

3.38.0

Jul 26, 2025

3.37.0

Jul 19, 2025

3.36.3

Jul 6, 2025

3.36.2

Jul 5, 2025

3.36.1

Jun 30, 2025

3.36.0

Jun 29, 2025

3.35.1

Jun 6, 2025

3.35.0

May 31, 2025

3.34.0

May 27, 2025

3.33.1

May 25, 2025

3.33.0

May 16, 2025

3.32.1

May 12, 2025

3.32.0

May 11, 2025

3.31.4

May 10, 2025

3.31.3

Apr 30, 2025

3.31.2

Apr 26, 2025

3.31.1

Apr 25, 2025

3.31.0

Apr 23, 2025

3.30.0

Apr 21, 2025

3.29.1

Apr 16, 2025

3.29.0

Apr 6, 2025

3.28.2

Apr 5, 2025

3.28.1

Apr 4, 2025

3.28.0

Mar 28, 2025

3.27.1

Mar 27, 2025

3.27.0

Mar 26, 2025

3.26.0

Mar 22, 2025

3.25.0

Mar 22, 2025

3.24.1

Mar 12, 2025

3.24.0

Mar 11, 2025

3.23.0

Mar 6, 2025

3.22.1

Feb 28, 2025

3.22.0

Feb 18, 2025

3.21.1

Feb 16, 2025

3.21.0

Feb 15, 2025

3.20.0

Feb 15, 2025

3.19.0

Feb 11, 2025

3.18.1

Feb 7, 2025

3.18.0

Feb 6, 2025

3.17.1

Feb 5, 2025

3.17.0

Feb 2, 2025

3.16.0

Feb 1, 2025

3.15.1

Jan 30, 2025

3.15.0

Jan 29, 2025

3.14.2

Jan 25, 2025

3.14.1

Jan 24, 2025

3.14.0

Jan 22, 2025

3.13.4

Jan 19, 2025

3.13.3

Jan 18, 2025

3.13.2

Jan 18, 2025

3.13.1

Jan 11, 2025

3.13.0

Dec 29, 2024

This version

3.12.0

Dec 27, 2024

3.11.0

Dec 24, 2024

3.10.1

Dec 18, 2024

3.10.0

Dec 16, 2024

3.9.3

Dec 15, 2024

3.9.2

Nov 30, 2024

3.9.1

Nov 29, 2024

3.9.0

Nov 27, 2024

3.8.2

Nov 20, 2024

3.8.1

Nov 20, 2024

3.8.0

Nov 19, 2024

3.7.0

Nov 4, 2024

3.6.0

Oct 29, 2024

3.5.0

Oct 18, 2024

3.4.1

Oct 13, 2024

3.4.0

Oct 5, 2024

3.3.1

Sep 22, 2024

3.3.0

Sep 15, 2024

3.2.0

Aug 31, 2024

3.1.1

Aug 27, 2024

3.1.0

Aug 27, 2024

3.0.0

Aug 25, 2024

2.4.0

Aug 23, 2024

2.3.0

Aug 21, 2024

2.2.0

Aug 12, 2024

2.1.0

Aug 10, 2024

2.0.0

Jul 30, 2024

1.14.0

Jun 29, 2024

1.13.0

Jun 28, 2024

1.12.0

Jun 27, 2024

1.11.0

Jun 26, 2024

1.10.0

Jun 25, 2024

1.9.0

Jun 21, 2024

1.8.0

Jun 12, 2024

1.7.1

Jun 11, 2024

1.7.0

Jun 8, 2024

1.6.3

Jun 7, 2024

1.6.2

Jun 6, 2024

1.6.1

Jun 5, 2024

1.6.0

Jun 4, 2024

1.5.5

Jun 3, 2024

1.5.4

Jun 3, 2024

1.5.3

Jun 2, 2024

1.5.2

Jun 2, 2024

1.5.1

Jun 2, 2024

1.5.0

Jun 2, 2024

1.4.0

May 30, 2024

1.3.0

May 28, 2024

1.2.0

May 25, 2024

1.1.3

May 24, 2024

1.1.2

May 23, 2024

1.1.1

May 23, 2024

1.1.0

May 22, 2024

1.0.0

May 21, 2024

0.1.dev3 pre-release

May 18, 2024

0.0.3

May 19, 2024

0.0.2

May 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlframe-3.12.0.tar.gz (29.0 MB view details)

Uploaded Dec 27, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sqlframe-3.12.0-py3-none-any.whl (182.0 kB view details)

Uploaded Dec 27, 2024 Python 3

File details

Details for the file sqlframe-3.12.0.tar.gz.

File metadata

Download URL: sqlframe-3.12.0.tar.gz
Upload date: Dec 27, 2024
Size: 29.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for sqlframe-3.12.0.tar.gz
Algorithm	Hash digest
SHA256	`42d0c42e282c2d572eb4d53eaae8512a1a70a157b29554342f4440d0074891fe`
MD5	`213a38a6c0c89e0226a94d234e81e798`
BLAKE2b-256	`1eb3bbfc3bb16e07f3de5ef94a2c96f29afbdedfaacdd578b8ba596120eba72b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe-3.12.0.tar.gz:

Publisher: publish.workflow.yaml on eakmanrq/sqlframe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sqlframe-3.12.0.tar.gz
- Subject digest: 42d0c42e282c2d572eb4d53eaae8512a1a70a157b29554342f4440d0074891fe
- Sigstore transparency entry: 157822954
- Sigstore integration time: Dec 27, 2024
Source repository:
- Permalink: eakmanrq/sqlframe@6137f553ead679b7d908a14dbde8e1f00f2f21de
- Branch / Tag: refs/tags/v3.12.0
- Owner: https://github.com/eakmanrq
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.workflow.yaml@6137f553ead679b7d908a14dbde8e1f00f2f21de
- Trigger Event: push

File details

Details for the file sqlframe-3.12.0-py3-none-any.whl.

File metadata

Download URL: sqlframe-3.12.0-py3-none-any.whl
Upload date: Dec 27, 2024
Size: 182.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for sqlframe-3.12.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`25af0bf67a906ef6be38f2ad7004001ac8d45cffd5b788a15dca17a609bd5d26`
MD5	`f19aae41e6337fe35227d79ade158d20`
BLAKE2b-256	`e26a45092e5a28aa12a3340944b43d7a831d5d99de811eb51319ec2d4c4ab7b1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe-3.12.0-py3-none-any.whl:

Publisher: publish.workflow.yaml on eakmanrq/sqlframe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sqlframe-3.12.0-py3-none-any.whl
- Subject digest: 25af0bf67a906ef6be38f2ad7004001ac8d45cffd5b788a15dca17a609bd5d26
- Sigstore transparency entry: 157822959
- Sigstore integration time: Dec 27, 2024
Source repository:
- Permalink: eakmanrq/sqlframe@6137f553ead679b7d908a14dbde8e1f00f2f21de
- Branch / Tag: refs/tags/v3.12.0
- Owner: https://github.com/eakmanrq
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.workflow.yaml@6137f553ead679b7d908a14dbde8e1f00f2f21de
- Trigger Event: push

sqlframe 3.12.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Installation

Configuration

Activating SQLFrame

Example Usage

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance