Read SAS (sas7bdat), Stata (dta), and SPSS (sav) files with polars
Project description
polars_readstat
Polars plugin for SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files.
The Python package wraps the Rust core in polars_readstat_rs and exposes a simple Polars-first API. I have tried to make sure there are no errors or regressions in this release (tested against 178 test files from pandas, pyreadstat, etc.). If I missed something, you can find info on the readme for the prior version and install v0.11.1 from pypi.
Why use this?
- In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.
- It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.
- API is Polars-first (
scan_readstat,read_readstat,write_readstat).
Install
pip install polars-readstat
Core API
1) Lazy scan
import polars as pl
from polars_readstat import scan_readstat
lf = scan_readstat("/path/file.sas7bdat", preserve_order=True)
df = lf.select(["SERIALNO", "AGEP"]).filter(pl.col("AGEP") >= 18).collect()
2) Eager read
from polars_readstat import read_readstat
df = read_readstat("/path/file.dta")
3) Metadata + schema
from polars_readstat import ScanReadstat
reader = ScanReadstat(path="/path/file.sav")
schema = reader.schema
metadata = reader.metadata
4) Write (Stata/SPSS)
from polars_readstat import write_readstat
write_readstat(df, "/path/out.dta", threads=8)
write_readstat(df, "/path/out.sav")
write_readstat supports Stata (dta) and SPSS (sav). SAS writing is not supported.
Tests run
We’ve tried to test this thoroughly:
- Cross-library comparisons on the pyreadstat and pandas test data, checking results against
polars-readstat==0.11.1, pyreadstat, and pandas. - Stata/SPSS read/write roundtrip tests.
- Large-file read/write benchmark runs on real-world data (results below).
If you want to run the same checks locally, helper scripts and tests are in scripts/ and tests/.
Benchmark
For each file, I compared 4 different scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subet of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).
Benchmark context:
- Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22
- Storage: external SSD
- Last run: August 31, 2025
- Version tested:
polars-readstat0.12 (new Rust engine) against polars-readstat 0.11.1 (prior C++ and C engines) and pandas and pyreadstat - Method: wall-clock timings via Python
time.time()
Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)
SAS
all times in seconds (speedup relative to pandas in parenthesis below each)
| Library | Full File | Subset: True | Filter: True | Subset: True, Filter: True |
|---|---|---|---|---|
| polars_readstat New rust engine |
0.90 (1.7×) |
0.07 (29.4×) |
1.23 (2.5×) |
0.07 (29.9×) |
| polars_readstat engine="cpp" (fastest for 0.11.1) |
1.31 (1.6×) |
0.09 (22.9×) |
1.56 (1.9×) |
0.09 (23.2×) |
| pandas | 2.07 | 2.06 | 3.03 | 2.09 |
| pyreadstat | 10.75 (0.2×) |
0.46 (4.5×) |
11.93 (0.3×) |
0.50 (4.2×) |
Stata
all times in seconds (speedup relative to pandas in parenthesis below each)
| Library | Full File | Subset: True | Filter: True | Subset: True, Filter: True |
|---|---|---|---|---|
| polars_readstat New rust engine |
0.17 (6.7×) |
0.12 (9.8×) |
0.24 (4.1×) |
0.11 (8.7×) |
| polars_readstat engine="readstat" (the only option for 0.11.1) |
1.80 (0.6×) |
0.27 (4.4×) |
1.31 (0.8×) |
0.29 (3.3×) |
| pandas | 1.14 | 1.18 | 0.99 | 0.96 |
| pyreadstat | 7.46 (0.2×) |
2.18 (0.5×) |
7.66 (0.1×) |
2.24 (0.4×) |
Detailed benchmark notes and dataset descriptions are in BENCHMARKS.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_readstat-0.12.2-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: polars_readstat-0.12.2-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 20.5 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f427d49780769a3f9ded510be598185c91c083e3fee1a46822bc75977328ed7e
|
|
| MD5 |
3611325dee1e159719563a0f59aad6fb
|
|
| BLAKE2b-256 |
6e3b9a407c98edfe9b289b5a29d9bf728a71a41b5697df5a1496f2623a450cae
|
File details
Details for the file polars_readstat-0.12.2-cp39-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: polars_readstat-0.12.2-cp39-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 19.0 MB
- Tags: CPython 3.9+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff82a4bf2c2ce8827dbb7f4091d40121825a0e29b7a5c0390bcffb92c29667fb
|
|
| MD5 |
86f342ac149f6e7d955581091ce6f9ef
|
|
| BLAKE2b-256 |
f0c9c2d42f2f3bdeef873a995df6fdca2b9f00ef0f9ac2a0b9aa12c42bfc9bbe
|
File details
Details for the file polars_readstat-0.12.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: polars_readstat-0.12.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 19.0 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c442481ea52cc2e8a02903595d1dda0950352e0e67067757736d5ea196cfbc0b
|
|
| MD5 |
80a23e0317176c60530f15740e67fe38
|
|
| BLAKE2b-256 |
1de139aa73290716b32f43493e6ec2623e79613d8eb01347676f9d643fe0f45c
|
File details
Details for the file polars_readstat-0.12.2-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: polars_readstat-0.12.2-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 16.7 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e537d00b6394d5be4598ba6de7aa52c4816a777744a21971815fdd0bbde3947
|
|
| MD5 |
6bb187f40a3deb08ccd7156df16356a9
|
|
| BLAKE2b-256 |
4ce7978322647c5b93152e1ae2c6d1c16bc7d83d84acefa0ceae1a35428bc8a5
|
File details
Details for the file polars_readstat-0.12.2-cp39-abi3-macosx_10_15_x86_64.whl.
File metadata
- Download URL: polars_readstat-0.12.2-cp39-abi3-macosx_10_15_x86_64.whl
- Upload date:
- Size: 18.3 MB
- Tags: CPython 3.9+, macOS 10.15+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
423cc9b10a55d7aa5ee9e6f4196512c5bc9b6441b81fa8e62aa2d5f151ea3376
|
|
| MD5 |
a47f240cfa802f9bc2f86d032a75a1e7
|
|
| BLAKE2b-256 |
cd5d2de9ba29a23ad34049f930cbc99425641a4e0bf6f9630cf1fa5ae1f86f4e
|