Sail Python library

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lakesail

These details have not been verified by PyPI

Project links

Project description

Sail

Sail is an open-source unified and distributed multimodal computation framework created by LakeSail.

Our mission is to unify batch processing, stream processing, and compute-intensive AI workloads. Sail is a compute engine that is:

Compatible with the Spark Connect protocol, supporting the Spark SQL and DataFrame API with no code rewrites required.
~4x faster than Spark in benchmarks (up to 8x in specific workloads).
94% cheaper on infrastructure costs.
100% Rust-native with no JVM overhead, delivering memory safety, instant startup, and predictable performance.

🚀 Sail outperforms Spark, popular Spark accelerators, Databricks, and Snowflake on ClickBench.

Documentation

The documentation of the latest Sail version can be found here.

Installation

Quick Start

Sail is available as a Python package on PyPI. You can install it along with PySpark in your Python environment.

pip install pysail
pip install "pyspark[connect]"

Alternatively, you can install the lightweight client package pyspark-client since Spark 4.0. The pyspark-connect package, which is equivalent to pyspark[connect], is also available since Spark 4.0.

Advanced Use Cases

You can install Sail from source to optimize performance for your specific hardware architecture. The detailed Installation Guide walks you through this process step-by-step.

If you need to deploy Sail in production environments, the Deployment Guide provides comprehensive instructions for deploying Sail on Kubernetes clusters and other infrastructure configurations.

Getting Started

Starting the Sail Server

Option 1: Command Line Interface. You can start the local Sail server using the sail command.

sail spark server --port 50051

Option 2: Python API. You can start the local Sail server using the Python API.

from pysail.spark import SparkConnectServer

server = SparkConnectServer(port=50051)
server.start(background=False)

Option 3: Kubernetes. You can deploy Sail on Kubernetes and run Sail in cluster mode for distributed processing. Please refer to the Kubernetes Deployment Guide for instructions on building the Docker image and writing the Kubernetes manifest YAML file.

kubectl apply -f sail.yaml
kubectl -n sail port-forward service/sail-spark-server 50051:50051

Connecting to the Sail Server

Once you have a running Sail server, you can connect to it in PySpark. No changes are needed in your PySpark code!

from pyspark.sql import SparkSession

spark = SparkSession.builder.remote("sc://localhost:50051").getOrCreate()
spark.sql("SELECT 1 + 1").show()

Please refer to the Getting Started guide for further details.

Feature Highlights

Storage

Sail supports a variety of storage backends for reading and writing data. You can read more details in our Storage Guide.

Here are the storage options supported:

AWS S3
Cloudflare R2
Azure
Google Cloud Storage
Hugging Face
HDFS
File systems
HTTP/HTTPS
In-memory storage

Lakehouse Formats

Sail provides native support for modern lakehouse table formats, offering reliable storage layers with strong data management guarantees and ensuring interoperability with existing datasets.

Please refer to the following guides for the supported formats:

Catalog Providers

Sail supports multiple catalog providers, such as the Apache Iceberg REST Catalog and Unity Catalog. You can manage datasets as external tables and integrate with broader data-platform ecosystems.

For more details on usage and best practices, see the Catalog Guide.

Benchmark Results

Derived TPC-H results show that Sail outperforms Apache Spark in every query:

Execution Time: ~4× faster across diverse SQL workloads.
Hardware Cost: 94% lower with significantly lower peak memory usage and zero shuffle spill.

Metric	Spark	Sail
Total Query Time	387.36 s	102.75 s
Query Speed-Up	Baseline	43% – 727%
Peak Memory Usage	54 GB	22 GB (1 s)
Disk Write (Shuffle Spill)	> 110 GB	0 GB

These results come from a derived TPC-H benchmark (22 queries, scale factor 100, Parquet format) on AWS r8g.4xlarge instances.

Query Time Comparison

See the full analysis and graphs on our Benchmark Results page.

Contributing

Contributions are more than welcome!

Please submit GitHub issues for bug reports and feature requests. You are also welcome to ask questions in GitHub discussions.

Feel free to create a pull request if you would like to make a code change. You can refer to the Development Guide to get started.

Additionally, please join our Slack Community if you haven’t already!

Why Choose Sail?

When Spark was invented over 15 years ago, it was revolutionary. It redefined distributed data processing and powered ETL, machine learning, and analytics pipelines across industries.

But Spark’s JVM-based architecture now struggles to meet modern demands for performance and cloud efficiency:

Garbage collection pauses introduce latency spikes.
Serialization overhead slows data exchange between JVM and Python.
Heavy executors drive up cloud costs and complicate scaling.
Row-based processing performs poorly on analytical workloads and leaves hardware efficiency untapped.
Slow startup delays workloads, hurting interactive and on-demand use cases.

Sail solves these problems with a modern, Rust-native design.

Sail is Spark-compatible

Sail offers a drop-in replacement for Spark SQL and the Spark DataFrame API. Existing PySpark code works out of the box once you connect your Spark client session to Sail over the Spark Connect protocol.

Spark SQL Dialect Support. A custom Rust parser (built with parser combinators and Rust procedural macros) covers Spark SQL syntax with production-grade accuracy.
DataFrame API Support. Spark DataFrame operations run on Sail with identical semantics.
Python UDF, UDAF, UDWF, and UDTF Support. Python, Pandas, and Arrow UDFs all follow the same conventions as Spark.

Sail’s Advantages over Spark

Rust-Native Engine. No garbage collection pauses, no JVM memory tuning, and low memory footprint.
Columnar Format and Vectorized Execution. Built on top of Apache Arrow and Apache DataFusion, the columnar in-memory format and SIMD instructions unlock blazing-fast query execution.
Lightning-Fast Python UDFs. Python code runs inside Sail with zero serialization overhead as Arrow array pointers enable zero-copy data sharing.
Performant Data Shuffling. Workers exchange Arrow columnar data directly, minimizing shuffle costs for joins and aggregations.
Lightweight, Stateless Workers. Workers start in seconds, consume only a few megabytes of memory at idle, and scale elastically to cut cloud costs and simplify operations.
Concurrency and Memory Safety You Can Trust. Rust’s ownership model prevents null pointers, race conditions, and unsafe memory access for unmatched reliability.

Curious about how Sail stacks up against Spark? Explore our Why Sail? page. Ready to bring your existing workloads over? Our Migration Guide shows you how.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lakesail

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.2

May 6, 2026

0.6.1

Apr 28, 2026

0.6.0

Apr 14, 2026

0.5.3

Mar 21, 2026

0.5.2

Mar 2, 2026

0.5.1

Feb 15, 2026

This version

0.5.0

Feb 6, 2026

0.4.6

Jan 13, 2026

0.4.5

Dec 22, 2025

0.4.4

Dec 12, 2025

0.4.3

Nov 27, 2025

0.4.2

Nov 13, 2025

0.4.1

Nov 2, 2025

0.4.0

Oct 29, 2025

0.3.7

Oct 3, 2025

0.3.6

Sep 30, 2025

0.3.5

Sep 5, 2025

0.3.4

Sep 3, 2025

0.3.3

Aug 14, 2025

0.3.2

Aug 8, 2025

0.3.1

Jul 7, 2025

0.3.0

Jun 28, 2025

0.2.6

May 14, 2025

0.2.5

Apr 22, 2025

0.2.4

Apr 11, 2025

0.2.3

Mar 22, 2025

0.2.2

Mar 6, 2025

0.2.1

Jan 15, 2025

0.2.0

Dec 3, 2024

0.2.0.dev0 pre-release

Nov 19, 2024

0.1.7

Nov 2, 2024

0.1.6

Oct 23, 2024

0.1.5

Oct 18, 2024

0.1.4

Oct 5, 2024

0.1.3

Sep 18, 2024

0.1.2

Sep 10, 2024

0.1.1

Sep 2, 2024

0.1.0

Aug 29, 2024

0.1.0.dev0 pre-release

Aug 2, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysail-0.5.0.tar.gz (1.6 MB view details)

Uploaded Feb 6, 2026 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pysail-0.5.0-cp38-abi3-win_amd64.whl (58.2 MB view details)

Uploaded Feb 6, 2026 CPython 3.8+Windows x86-64

pysail-0.5.0-cp38-abi3-manylinux_2_24_aarch64.whl (49.9 MB view details)

Uploaded Feb 6, 2026 CPython 3.8+manylinux: glibc 2.24+ ARM64

pysail-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53.7 MB view details)

Uploaded Feb 6, 2026 CPython 3.8+manylinux: glibc 2.17+ x86-64

pysail-0.5.0-cp38-abi3-macosx_11_0_arm64.whl (48.2 MB view details)

Uploaded Feb 6, 2026 CPython 3.8+macOS 11.0+ ARM64

pysail-0.5.0-cp38-abi3-macosx_10_12_x86_64.whl (52.3 MB view details)

Uploaded Feb 6, 2026 CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file pysail-0.5.0.tar.gz.

File metadata

Download URL: pysail-0.5.0.tar.gz
Upload date: Feb 6, 2026
Size: 1.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.11.5

File hashes

Hashes for pysail-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`c7170b8a37d11ef363996b46fa93abbf443d8d59c0e02c7f4869653e720906dd`
MD5	`b691de06a2ffd6b480f12d9759cc5364`
BLAKE2b-256	`41ac3cae08a873fb2697709634a622ddcba5dece26d93aa7b3ecb1cdbe16c226`

See more details on using hashes here.

File details

Details for the file pysail-0.5.0-cp38-abi3-win_amd64.whl.

File metadata

Download URL: pysail-0.5.0-cp38-abi3-win_amd64.whl
Upload date: Feb 6, 2026
Size: 58.2 MB
Tags: CPython 3.8+, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.11.5

File hashes

Hashes for pysail-0.5.0-cp38-abi3-win_amd64.whl
Algorithm	Hash digest
SHA256	`eed5740cff0ec99745ac36e79f4ee8ca58e5a8694c034955d489b74898c32112`
MD5	`3bb526664a47ac10d2761068bb97cad2`
BLAKE2b-256	`820cbc31f3666c15b82b1dca1f803014cc0ab228d13cdd1add1e15f849deec9c`

See more details on using hashes here.

File details

Details for the file pysail-0.5.0-cp38-abi3-manylinux_2_24_aarch64.whl.

File metadata

Download URL: pysail-0.5.0-cp38-abi3-manylinux_2_24_aarch64.whl
Upload date: Feb 6, 2026
Size: 49.9 MB
Tags: CPython 3.8+, manylinux: glibc 2.24+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.11.5

File hashes

Hashes for pysail-0.5.0-cp38-abi3-manylinux_2_24_aarch64.whl
Algorithm	Hash digest
SHA256	`438e07761149123097d27c2728662d7196ab7bff5f634154d97dcda1948e739b`
MD5	`10b4512238dd93989118f113428f53ce`
BLAKE2b-256	`0ad3234e38c54ed0beb264f3d477aea5095ca99838701ff7c9b167a447487388`

See more details on using hashes here.

File details

Details for the file pysail-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: pysail-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Feb 6, 2026
Size: 53.7 MB
Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.11.5

File hashes

Hashes for pysail-0.5.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`c65b71716e3d251ba98bda85b2db83e6c455df5b0d71a0194b96cf340fc12fb2`
MD5	`4ae06b811e5e503dd3889adcb3cee4f3`
BLAKE2b-256	`209a2b3bb0e560bd33b1adfd1f091bf87f156b403e69327b9748ed88469da8f9`

See more details on using hashes here.

File details

Details for the file pysail-0.5.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

Download URL: pysail-0.5.0-cp38-abi3-macosx_11_0_arm64.whl
Upload date: Feb 6, 2026
Size: 48.2 MB
Tags: CPython 3.8+, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.11.5

File hashes

Hashes for pysail-0.5.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`246d09a6c431b8d59c57744d609d00e4001eee92b1a061b8333261ee5d7177f7`
MD5	`f971dc9a163d58b8a09d85968e510d97`
BLAKE2b-256	`07761ed25eeced15b818a67286a8291a0fc8b94588593506c43f215c89e55cea`

See more details on using hashes here.

File details

Details for the file pysail-0.5.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

Download URL: pysail-0.5.0-cp38-abi3-macosx_10_12_x86_64.whl
Upload date: Feb 6, 2026
Size: 52.3 MB
Tags: CPython 3.8+, macOS 10.12+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: maturin/1.11.5

File hashes

Hashes for pysail-0.5.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm	Hash digest
SHA256	`6a505a954617765b3a02ed51d8f9e719d6df84a6c7229bec9a44ab94fffc72e6`
MD5	`8442759003a5634ba9f6b829b952b218`
BLAKE2b-256	`9a31a937e5ed967d336286ec596aba1e77b83a8cde6f1af13468fbc2d9a86b59`

See more details on using hashes here.

pysail 0.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Sail

Documentation

Installation

Quick Start

Advanced Use Cases

Getting Started

Starting the Sail Server

Connecting to the Sail Server

Feature Highlights

Storage

Lakehouse Formats

Catalog Providers

Benchmark Results

Contributing

Why Choose Sail?

Sail is Spark-compatible

Sail’s Advantages over Spark

Further Reading

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes