Skip to main content

Sail Python library

Project description

Sail

Build Status Codecov PyPI Release Static Slack Badge

Sail is an open-source unified and distributed multimodal computation framework created by LakeSail.

Our mission is to unify batch processing, stream processing, and compute-intensive AI workloads. Sail is a compute engine that is:

  • Compatible with the Spark Connect protocol, supporting the Spark SQL and DataFrame API with no code rewrites required.
  • ~4x faster than Spark in benchmarks (up to 8x in specific workloads).
  • 94% cheaper on infrastructure costs.
  • 100% Rust-native with no JVM overhead, delivering memory safety, instant startup, and predictable performance.

🚀 Sail outperforms Spark, popular Spark accelerators, Databricks, and Snowflake on ClickBench.

💬 Join our Slack community to ask questions, share feedback, and connect with other Sail users and contributors.

Documentation

The documentation of the latest Sail version can be found here.

Installation

Quick Start

Sail is available as a Python package on PyPI. You can install it along with PySpark in your Python environment.

pip install pysail
pip install "pyspark[connect]"

Alternatively, you can install the lightweight client package pyspark-client since Spark 4.0. The pyspark-connect package, which is equivalent to pyspark[connect], is also available since Spark 4.0.

Advanced Use Cases

You can install Sail from source to optimize performance for your specific hardware architecture. The detailed Installation Guide walks you through this process step-by-step.

If you need to deploy Sail in production environments, the Deployment Guide provides comprehensive instructions for deploying Sail on Kubernetes clusters and other infrastructure configurations.

Getting Started

Starting the Sail Server

Option 1: Command Line Interface. You can start the local Sail server using the sail command.

sail spark server --port 50051

Option 2: Python API. You can start the local Sail server using the Python API.

from pysail.spark import SparkConnectServer

server = SparkConnectServer(port=50051)
server.start(background=False)

Option 3: Kubernetes. You can deploy Sail on Kubernetes and run Sail in cluster mode for distributed processing. Please refer to the Kubernetes Deployment Guide for instructions on building the Docker image and writing the Kubernetes manifest YAML file.

kubectl apply -f sail.yaml
kubectl -n sail port-forward service/sail-spark-server 50051:50051

Connecting to the Sail Server

Once you have a running Sail server, you can connect to it in PySpark. No changes are needed in your PySpark code!

from pyspark.sql import SparkSession

spark = SparkSession.builder.remote("sc://localhost:50051").getOrCreate()
spark.sql("SELECT 1 + 1").show()

Please refer to the Getting Started guide for further details.

Spark Compatibility

Sail is designed to be compatible with Spark 3.5.x, Spark 4.x, and later versions. Existing PySpark code works out of the box once you connect your Spark client session to Sail over the Spark Connect protocol.

As a starting point, Sail ships with an experimental PySpark function compatibility check script that scans your codebase for PySpark functions and reports their Sail support status.

python -m pysail.examples.spark.compatibility_check <directory>

Experimental Use the script as a rough first pass only. The script checks whether referenced PySpark functions are implemented in Sail. It does not verify behavioral parity. It looks for functions used in DataFrame operations but does not cover Spark SQL strings.

See the Migration Guide for recommended migration practices.

Feature Highlights

Storage

Sail supports a variety of storage backends for reading and writing data. You can read more details in our Storage Guide.

Here are the storage options supported:

  • AWS S3
  • Cloudflare R2
  • Azure
  • Google Cloud Storage
  • Hugging Face
  • HDFS
  • File systems
  • HTTP/HTTPS
  • In-memory storage

Lakehouse Formats

Sail provides native support for modern lakehouse table formats, offering reliable storage layers with strong data management guarantees and ensuring interoperability with existing datasets.

Please refer to the following guides for the supported formats:

Catalog Providers

Sail supports multiple catalog providers, such as the Apache Iceberg REST Catalog and Unity Catalog. You can manage datasets as external tables and integrate with broader data-platform ecosystems.

For more details on usage and best practices, see the Catalog Guide.

Benchmark Results

Derived TPC-H results show that Sail outperforms Apache Spark in every query:

  • Execution Time: ~4× faster across diverse SQL workloads.
  • Hardware Cost: 94% lower with significantly lower peak memory usage and zero shuffle spill.
Metric Spark Sail
Total Query Time 387.36 s 102.75 s
Query Speed-Up Baseline 43% – 727%
Peak Memory Usage 54 GB 22 GB (1 s)
Disk Write (Shuffle Spill) > 110 GB 0 GB

These results come from a derived TPC-H benchmark (22 queries, scale factor 100, Parquet format) on AWS r8g.4xlarge instances.

Query Time Comparison

See the full analysis and graphs on our Benchmark Results page.

Why Choose Sail?

When Spark was invented over 15 years ago, it was revolutionary. It redefined distributed data processing and powered ETL, machine learning, and analytics pipelines across industries.

But Spark’s JVM-based architecture now struggles to meet modern demands for performance and cloud efficiency:

  • Garbage collection pauses introduce latency spikes.
  • Serialization overhead slows data exchange between JVM and Python.
  • Heavy executors drive up cloud costs and complicate scaling.
  • Row-based processing performs poorly on analytical workloads and leaves hardware efficiency untapped.
  • Slow startup delays workloads, hurting interactive and on-demand use cases.

Sail solves these problems with a modern, Rust-native design.

Sail is Spark-compatible

Sail offers a drop-in replacement for Spark SQL and the Spark DataFrame API. Existing PySpark code works out of the box once you connect your Spark client session to Sail over the Spark Connect protocol.

  • Spark SQL Dialect Support. A custom Rust parser (built with parser combinators and Rust procedural macros) covers Spark SQL syntax with production-grade accuracy.
  • DataFrame API Support. Spark DataFrame operations run on Sail with identical semantics.
  • Python UDF, UDAF, UDWF, and UDTF Support. Python, Pandas, and Arrow UDFs all follow the same conventions as Spark.

Sail’s Advantages over Spark

  • Rust-Native Engine. No garbage collection pauses, no JVM memory tuning, and low memory footprint.
  • Columnar Format and Vectorized Execution. Built on top of Apache Arrow and Apache DataFusion, the columnar in-memory format and SIMD instructions unlock blazing-fast query execution.
  • Lightning-Fast Python UDFs. Python code runs inside Sail with zero serialization overhead as Arrow array pointers enable zero-copy data sharing.
  • Performant Data Shuffling. Workers exchange Arrow columnar data directly, minimizing shuffle costs for joins and aggregations.
  • Lightweight, Stateless Workers. Workers start in seconds, consume only a few megabytes of memory at idle, and scale elastically to cut cloud costs and simplify operations.
  • Concurrency and Memory Safety You Can Trust. Rust’s ownership model prevents null pointers, race conditions, and unsafe memory access for unmatched reliability.

Curious about how Sail stacks up against Spark? Explore our Why Sail? page. Ready to bring your existing workloads over? Our Migration Guide shows you how.

Further Reading

  • Architecture – Overview of Sail’s design for both local and cluster modes, and how it transitions seamlessly between them.
  • Query Planning – Detailed explanation of how Sail parses SQL and Spark relations, builds logical and physical plans, and handles execution for local and cluster modes.
  • SQL and DataFrame Features – Complete reference for Spark SQL and DataFrame API compatibility.
  • LakeSail Blog – Updates on Sail releases, benchmarks, and technical insights.

✨Using Sail? Tell us your story and get free merch!✨

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysail-0.6.1.tar.gz (2.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pysail-0.6.1-cp38-abi3-win_amd64.whl (56.3 MB view details)

Uploaded CPython 3.8+Windows x86-64

pysail-0.6.1-cp38-abi3-manylinux_2_24_aarch64.whl (48.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.24+ ARM64

pysail-0.6.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (51.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

pysail-0.6.1-cp38-abi3-macosx_11_0_arm64.whl (46.6 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

pysail-0.6.1-cp38-abi3-macosx_10_12_x86_64.whl (50.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file pysail-0.6.1.tar.gz.

File metadata

  • Download URL: pysail-0.6.1.tar.gz
  • Upload date:
  • Size: 2.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pysail-0.6.1.tar.gz
Algorithm Hash digest
SHA256 716e835946a0f86273b4c27b6b76770096205414d8b8657733d51f39dc31c9f9
MD5 60fbe1e55fa7689652b88bc729363107
BLAKE2b-256 9abd86766674278d19dad75f3b6a19f79e4e8698fde752c01ea5f98fc90cde8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.1.tar.gz:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysail-0.6.1-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: pysail-0.6.1-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 56.3 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pysail-0.6.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 bda53892e92e9c3fbb4ee4332eb2ef48ba66cb79d2c75c45e10cff550db04599
MD5 76b0941060d50b585975944e1b22abca
BLAKE2b-256 0b5bb71da2e9851feb702f25427ccf94612c874182362d95401d5499f16bed66

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.1-cp38-abi3-win_amd64.whl:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysail-0.6.1-cp38-abi3-manylinux_2_24_aarch64.whl.

File metadata

File hashes

Hashes for pysail-0.6.1-cp38-abi3-manylinux_2_24_aarch64.whl
Algorithm Hash digest
SHA256 da2339c9579b38be05baf7eaa2873ec824270a65d2e89f85e99666fcce4e0976
MD5 4dabf741157223604d64040b809a7ad2
BLAKE2b-256 c154fa7bbe0d533b926f8623a706ceb953820c596e49e4d4fbdbcab0e35a537f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.1-cp38-abi3-manylinux_2_24_aarch64.whl:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysail-0.6.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pysail-0.6.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 37db18fc0d89e32b8d7dea49cb173fb6993db067d1db71abbe3a22045389a9a9
MD5 f3370cf99e0712964ee10f7893f3ca28
BLAKE2b-256 c219737fba07abb3af65fd0c8360fc0f1653704fbeb413796d87d4df2bc1a0f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysail-0.6.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pysail-0.6.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 34c0aabd466e8e9ea5330a7d07e666b3c26e257ce7a8fd899407bfb74bc31ce2
MD5 5e030a9a51f247b4c5efd2ea30ee475b
BLAKE2b-256 92c7f51f89d7cdac1e9f834e7c7c79e09789a87d461aa16130ed1e9c2f96ebae

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.1-cp38-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysail-0.6.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pysail-0.6.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3b46076c30283aafbe9be28499261ea40d60b677d4497344cebb235ed46197c9
MD5 7d5f2a4426f01b1703b67a6ed68fc7e2
BLAKE2b-256 ac2fd71d274e574d7ee4c80b9937a9747fac6b9ad9c073c68f792d02965c7d51

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysail-0.6.1-cp38-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on lakehq/sail

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page