Skip to main content

Read SAS (sas7bdat), Stata (dta), and SPSS (sav) files with polars

Project description

polars_readstat

Polars plugin for SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files.

The Python package wraps the Rust core in polars_readstat_rs and exposes a simple Polars-first API. This update (to eventual release 0.12.0) is in progress. For the currently available version (on pypi), see the prior readme.

Why use this?

  • In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.
  • It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.
  • API is Polars-first (scan_readstat, read_readstat, write_readstat).

Install

pip install polars-readstat

Core API

1) Lazy scan

import polars as pl
from polars_readstat import scan_readstat

lf = scan_readstat("/path/file.sas7bdat", preserve_order=True)
df = lf.select(["SERIALNO", "AGEP"]).filter(pl.col("AGEP") >= 18).collect()

2) Eager read

from polars_readstat import read_readstat

df = read_readstat("/path/file.dta")

3) Metadata + schema

from polars_readstat import ScanReadstat

reader = ScanReadstat(path="/path/file.sav")
schema = reader.schema
metadata = reader.metadata

4) Write (Stata/SPSS)

from polars_readstat import write_readstat

write_readstat(df, "/path/out.dta", threads=8)
write_readstat(df, "/path/out.sav")

write_readstat supports Stata (dta) and SPSS (sav). SAS writing is not supported.

Tests run

We’ve tried to test this thoroughly:

  • Cross-library comparisons on the pyreadstat and pandas test data, checking results against polars-readstat==0.11.1, pyreadstat, and pandas.
  • Stata/SPSS read/write roundtrip tests.
  • Large-file read/write benchmark runs on real-world data (results below).

If you want to run the same checks locally, helper scripts and tests are in scripts/ and tests/.

Benchmark

For each file, I compared 4 different scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subet of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).

Benchmark context:

  • Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22
  • Storage: external SSD
  • Last run: August 31, 2025
  • Version tested: polars-readstat 0.12 (new Rust engine) against polars-readstat 0.11.1 (prior C++ and C engines) and pandas and pyreadstat
  • Method: wall-clock timings via Python time.time()

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine (in progress)
0.90
(1.7×)
0.07
(29.4×)
1.23
(2.5×)
0.07
(29.9×)
polars_readstat
engine="cpp"
(fastest for 0.11.1)
1.31
(1.6×)
0.09
(22.9×)
1.56
(1.9×)
0.09
(23.2×)
pandas 2.07 2.06 3.03 2.09
pyreadstat 10.75
(0.2×)
0.46
(4.5×)
11.93
(0.3×)
0.50
(4.2×)

Stata

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine (in progress)
0.17
(6.7×)
0.12
(9.8×)
0.24
(4.1×)
0.11
(8.7×)
polars_readstat
engine="readstat"
(the only option for 0.11.1)
1.80
(0.6×)
0.27
(4.4×)
1.31
(0.8×)
0.29
(3.3×)
pandas 1.14 1.18 0.99 0.96
pyreadstat 7.46
(0.2×)
2.18
(0.5×)
7.66
(0.1×)
2.24
(0.4×)

Detailed benchmark notes and dataset descriptions are in BENCHMARKS.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_readstat-0.12.0-cp39-abi3-win_amd64.whl (20.2 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_readstat-0.12.0-cp39-abi3-manylinux_2_28_x86_64.whl (18.7 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

polars_readstat-0.12.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.7 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_readstat-0.12.0-cp39-abi3-macosx_11_0_arm64.whl (16.5 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_readstat-0.12.0-cp39-abi3-macosx_10_15_x86_64.whl (18.0 MB view details)

Uploaded CPython 3.9+macOS 10.15+ x86-64

File details

Details for the file polars_readstat-0.12.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 6f75e0f68561590e739b8ad91c19c438bc82ce47d1e293b01fceca57261a3fed
MD5 1f7547bb91bfebf2a43fdc5fe1378a59
BLAKE2b-256 35398c7e7bcf724bfd51b0708a69d4bc733e1b2f09dd336021726a86dca51906

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.0-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.0-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0e3573994778a3500e38dd935a9668dd4770a6b78a8072341297badd51d87a9b
MD5 3019a7808711c5f92609a2172ea22601
BLAKE2b-256 d3e7b18153d04a58c51b65e52ed79bf0d282a03de224fe77d8bbbf6754be6147

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 267e6c6b9da296ab9486f8f1aaf85eac1d8b3278a0a908b4e1205ff46daf741b
MD5 70ba440dcd483e459fc9d96c88d4d743
BLAKE2b-256 f495d5af781c2788ba7cb95d9df6993d24ef4d897814ee5309654e46e896ed92

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c243827fee6632bcef71d7881798cc3f36985d204a4e91b1a18b97b02837b2ba
MD5 5a3bf0f1bc0dd765f7fe2c3d18efd020
BLAKE2b-256 5016a60373f0dfe502b3d543b4d2f2f81964804e19ec91aa005f7cd6a71e9c4e

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.0-cp39-abi3-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.0-cp39-abi3-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 cdfce5aa4885b1a2b345ec96a5529b6efb3cc08e6f01bc7e17e5ad8605a17e5c
MD5 6338710593f7313dd5246c1073e902cb
BLAKE2b-256 42bd33f4ec1da79c946d05bd0cb5c2fb866d30466ceca74bf0781e533a4d2aa8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page