Skip to main content

Sail Python library

Project description

Sail

Build Status Codecov PyPI Release Static Slack Badge

Sail is a drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads on a distributed, multimodal compute engine.

  • Compatible with the Spark Connect protocol, supporting the Spark SQL and DataFrame API with no code rewrites required.
  • 100% Rust-native with no JVM overhead, delivering memory safety, instant startup, and predictable performance.
  • ~4× faster (up to 8× in specific workloads) than Spark and 94% cheaper on infrastructure costs. See derived TPC-H benchmarks.
  • Proven on ClickBench, outperforming Spark, popular Spark accelerators, Databricks, and Snowflake.

Documentation

The documentation of the latest Sail version can be found here.

Installation

Quick Start

Sail is available as a Python package on PyPI. You can install it along with PySpark in your Python environment.

pip install pysail
pip install "pyspark-client"

Advanced Use Cases

You can install Sail from source to optimize performance for your specific hardware architecture. The detailed Installation Guide walks you through this process step-by-step.

If you need to deploy Sail in production environments, the Deployment Guide provides comprehensive instructions for deploying Sail on Kubernetes clusters and other infrastructure configurations.

Getting Started

Starting the Sail Server

Option 1: Command Line Interface. You can start the local Sail server using the sail command.

sail spark server --port 50051

Option 2: Python API. You can start the local Sail server using the Python API.

from pysail.spark import SparkConnectServer

server = SparkConnectServer(port=50051)
server.start(background=False)

Option 3: Kubernetes. You can deploy Sail on Kubernetes and run Sail in cluster mode for distributed processing. Please refer to the Kubernetes Deployment Guide for instructions on building the Docker image and writing the Kubernetes manifest YAML file.

kubectl apply -f sail.yaml
kubectl -n sail port-forward service/sail-spark-server 50051:50051

Connecting to the Sail Server

Once you have a running Sail server, you can connect to it in PySpark. No changes are needed in your PySpark code!

from pyspark.sql import SparkSession

spark = SparkSession.builder.remote("sc://localhost:50051").getOrCreate()
spark.sql("SELECT 1 + 1").show()

Please refer to the Getting Started guide for further details.

Spark Compatibility

Sail is designed to be compatible with Spark 3.5.x, Spark 4.x, and later versions. Existing PySpark code works out of the box once you connect your Spark client session to Sail over the Spark Connect protocol.

As a starting point, Sail ships with an experimental PySpark function compatibility check script that scans your codebase for PySpark functions and reports their Sail support status.

python -m pysail.examples.spark.compatibility_check <directory>

Experimental Use the script as a rough first pass only. The script checks whether referenced PySpark functions are implemented in Sail. It does not verify behavioral parity. It looks for functions used in DataFrame operations but does not cover Spark SQL strings.

See the Migration Guide for recommended migration practices.

Feature Highlights

Lakehouse Formats and Catalog Providers

Sail provides native support for the Delta Lake and Apache Iceberg table formats. It integrates with catalog providers including Apache Iceberg REST Catalog, AWS Glue, Unity Catalog, Hive Metastore, and Microsoft OneLake.

For more details on usage and best practices, see the Data Sources Guide and Catalog Guide.

Storage

Sail supports a variety of storage backends for reading and writing data, including:

  • AWS S3
  • Azure
  • Hugging Face
  • Cloudflare R2
  • Google Cloud Storage
  • HDFS
  • File systems
  • HTTP/HTTPS
  • In-memory storage

See the Storage Guide for more details.

Why Choose Sail?

For over 15 years, Spark has been the default engine for distributed data processing, powering ETL, machine learning, and analytics pipelines across nearly every industry.

But the JVM foundation that made Spark possible is now what holds it back. Sail is built to be a familiar, performant alternative without the JVM tax.

Sail is Spark-compatible

Sail offers a drop-in replacement for Spark SQL and the Spark DataFrame API. Existing PySpark code works out of the box once you connect your Spark client session to Sail over the Spark Connect protocol.

  • Spark SQL Dialect Support. A custom Rust parser (built with parser combinators and Rust procedural macros) covers Spark SQL syntax with production-grade accuracy.
  • DataFrame API Support. Spark DataFrame operations run on Sail with identical semantics.
  • Python UDF, UDAF, UDWF, and UDTF Support. Python, Pandas, and Arrow UDFs all follow the same conventions as Spark.

Sail’s Advantages over Spark

  • Rust-Native Engine. No garbage collection pauses, no JVM memory tuning, and low memory footprint.
  • Columnar Format and Vectorized Execution. Built on top of Apache Arrow and Apache DataFusion, the columnar in-memory format and SIMD instructions unlock blazing-fast query execution.
  • Lightning-Fast Python UDFs. Python code runs inside Sail with zero serialization overhead as Arrow array pointers enable zero-copy data sharing.
  • Performant Data Shuffling. Workers exchange Arrow columnar data directly, minimizing shuffle costs for joins and aggregations.
  • Lightweight, Stateless Workers. Workers start in seconds, consume only a few megabytes of memory at idle, and scale elastically to cut cloud costs and simplify operations.
  • Concurrency and Memory Safety You Can Trust. Rust’s ownership model prevents null pointers, race conditions, and unsafe memory access for unmatched reliability.

Ready to bring your existing workloads over? Our Migration Guide shows you how.

Benchmark Results

Derived TPC-H results show that Sail outperforms Apache Spark in every query:

  • Execution Time: ~4× faster across diverse SQL workloads.
  • Hardware Cost: 94% lower with significantly lower peak memory usage and zero shuffle spill.
Metric Spark Sail
Total Query Time 387.36 s 102.75 s
Query Speed-Up Baseline 43% – 727%
Peak Memory Usage 54 GB 22 GB (1 s)
Disk Write (Shuffle Spill) > 110 GB 0 GB

These results come from a derived TPC-H benchmark (22 queries, scale factor 100, Parquet format) on AWS r8g.4xlarge instances.

Query Time Comparison

See the full analysis and graphs on our Benchmark Results page.

Further Reading

  • Architecture – Overview of Sail’s design for both local and cluster modes, and how it transitions seamlessly between them.
  • Query Planning – Detailed explanation of how Sail parses SQL and Spark relations, builds logical and physical plans, and handles execution for local and cluster modes.
  • SQL and DataFrame Features – Complete reference for Spark SQL and DataFrame API compatibility.
  • LakeSail Blog – Updates on Sail releases, benchmarks, and technical insights.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysail-0.6.4.tar.gz (2.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pysail-0.6.4-cp38-abi3-win_amd64.whl (58.5 MB view details)

Uploaded CPython 3.8+Windows x86-64

pysail-0.6.4-cp38-abi3-manylinux_2_24_aarch64.whl (50.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.24+ ARM64

pysail-0.6.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

pysail-0.6.4-cp38-abi3-macosx_11_0_arm64.whl (48.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

pysail-0.6.4-cp38-abi3-macosx_10_12_x86_64.whl (52.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file pysail-0.6.4.tar.gz.

File metadata

  • Download URL: pysail-0.6.4.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pysail-0.6.4.tar.gz
Algorithm Hash digest
SHA256 af8b59219ab70fe4567863d7b4afb54dacbe237fa6f32f5356588a0876848d93
MD5 5bd19b70a42d60c8bf5364534bc78a1f
BLAKE2b-256 fb016196cbe376963790e72660779616ad7f9168e7e9ab1b4fa0e1449ffe5893

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.4.tar.gz:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysail-0.6.4-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: pysail-0.6.4-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 58.5 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pysail-0.6.4-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 7d62d799b4904ae4e05e809b02a110c65e9beaf62856e3f821917473b1053920
MD5 1c18b007bd4af5c68e5b844292e155bf
BLAKE2b-256 eaf8cd6f7db55ec9d491c4a1b9c215e33b687241722a87130d6fc682270dbff1

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.4-cp38-abi3-win_amd64.whl:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysail-0.6.4-cp38-abi3-manylinux_2_24_aarch64.whl.

File metadata

File hashes

Hashes for pysail-0.6.4-cp38-abi3-manylinux_2_24_aarch64.whl
Algorithm Hash digest
SHA256 d985eb531745335f5b7d34dd31e35cbf77ebbf5509cfda5dcf8906d820251398
MD5 33cfe4f98bfae3e713f2393ac68b5198
BLAKE2b-256 6a15d13d6c26a1f7dddfc90f4e153d907d361fc2854fb128edd0d649a97f30c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.4-cp38-abi3-manylinux_2_24_aarch64.whl:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysail-0.6.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pysail-0.6.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8635ec269f20fe23043f3fa8998467c4f381a38e2f2f3d2aaa521035c03394af
MD5 bfab846dfb8316f9637319ed1d71c2a2
BLAKE2b-256 9e7e1a00c3e1dda515e2c4dab423902c97854d2d9fb86fbd048c109e22dbc6e2

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysail-0.6.4-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pysail-0.6.4-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2b58d5f9b934ca7d7e394d31890a614f47a9c7cf0994d2f36ff84fda2ba4a652
MD5 a811ac22b564608a26d6353d069350cc
BLAKE2b-256 eefc55d99b112121f29499e88bf9e1c2b0391458add0ee611bd2d373cb3c9fe6

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.4-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysail-0.6.4-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pysail-0.6.4-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 983a23a38a54666f84882739884579ca4cf7fdd98b30c00b7712568e21501def
MD5 cefbf21836c8577fb8cdd1bb39c66e79
BLAKE2b-256 dc7d271eb5b3ac8e24e282d91ead20e732e47a17a664d3f5ab3dde6e2f1f918e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.4-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page