Skip to main content

Read SAS (sas7bdat), Stata (dta), and SPSS (sav) files with polars

Project description

polars_readstat

Polars plugin for SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files.

The Python package wraps the Rust core in polars_readstat_rs and exposes a simple Polars-first API. I have tried to make sure there are no errors or regressions in this release (tested against 178 test files from pandas, pyreadstat, etc.). If I missed something, you can find info on the readme for the prior version and install v0.11.1 from pypi.

Why use this?

  • In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.
  • It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.
  • API is Polars-first (scan_readstat, read_readstat, write_readstat).

Install

pip install polars-readstat

Core API

1) Lazy scan

import polars as pl
from polars_readstat import scan_readstat

lf = scan_readstat("/path/file.sas7bdat", preserve_order=True)
df = lf.select(["SERIALNO", "AGEP"]).filter(pl.col("AGEP") >= 18).collect()

2) Eager read

from polars_readstat import read_readstat

df = read_readstat("/path/file.dta")

3) Metadata + schema

from polars_readstat import ScanReadstat

reader = ScanReadstat(path="/path/file.sav")
schema = reader.schema
metadata = reader.metadata

4) Write (Stata/SPSS)

from polars_readstat import write_readstat

write_readstat(df, "/path/out.dta", threads=8)
write_readstat(df, "/path/out.sav")

write_readstat supports Stata (dta) and SPSS (sav). SAS writing is not supported.

Tests run

We’ve tried to test this thoroughly:

  • Cross-library comparisons on the pyreadstat and pandas test data, checking results against polars-readstat==0.11.1, pyreadstat, and pandas.
  • Stata/SPSS read/write roundtrip tests.
  • Large-file read/write benchmark runs on real-world data (results below).

If you want to run the same checks locally, helper scripts and tests are in scripts/ and tests/.

Benchmark

For each file, I compared 4 different scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subet of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).

Benchmark context:

  • Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22
  • Storage: external SSD
  • Last run: August 31, 2025
  • Version tested: polars-readstat 0.12 (new Rust engine) against polars-readstat 0.11.1 (prior C++ and C engines) and pandas and pyreadstat
  • Method: wall-clock timings via Python time.time()

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.90
(1.7×)
0.07
(29.4×)
1.23
(2.5×)
0.07
(29.9×)
polars_readstat
engine="cpp"
(fastest for 0.11.1)
1.31
(1.6×)
0.09
(22.9×)
1.56
(1.9×)
0.09
(23.2×)
pandas 2.07 2.06 3.03 2.09
pyreadstat 10.75
(0.2×)
0.46
(4.5×)
11.93
(0.3×)
0.50
(4.2×)

Stata

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.17
(6.7×)
0.12
(9.8×)
0.24
(4.1×)
0.11
(8.7×)
polars_readstat
engine="readstat"
(the only option for 0.11.1)
1.80
(0.6×)
0.27
(4.4×)
1.31
(0.8×)
0.29
(3.3×)
pandas 1.14 1.18 0.99 0.96
pyreadstat 7.46
(0.2×)
2.18
(0.5×)
7.66
(0.1×)
2.24
(0.4×)

Detailed benchmark notes and dataset descriptions are in BENCHMARKS.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_readstat-0.12.2-cp39-abi3-win_amd64.whl (20.5 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_readstat-0.12.2-cp39-abi3-manylinux_2_28_x86_64.whl (19.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

polars_readstat-0.12.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_readstat-0.12.2-cp39-abi3-macosx_11_0_arm64.whl (16.7 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_readstat-0.12.2-cp39-abi3-macosx_10_15_x86_64.whl (18.3 MB view details)

Uploaded CPython 3.9+macOS 10.15+ x86-64

File details

Details for the file polars_readstat-0.12.2-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.2-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f427d49780769a3f9ded510be598185c91c083e3fee1a46822bc75977328ed7e
MD5 3611325dee1e159719563a0f59aad6fb
BLAKE2b-256 6e3b9a407c98edfe9b289b5a29d9bf728a71a41b5697df5a1496f2623a450cae

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.2-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.2-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ff82a4bf2c2ce8827dbb7f4091d40121825a0e29b7a5c0390bcffb92c29667fb
MD5 86f342ac149f6e7d955581091ce6f9ef
BLAKE2b-256 f0c9c2d42f2f3bdeef873a995df6fdca2b9f00ef0f9ac2a0b9aa12c42bfc9bbe

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c442481ea52cc2e8a02903595d1dda0950352e0e67067757736d5ea196cfbc0b
MD5 80a23e0317176c60530f15740e67fe38
BLAKE2b-256 1de139aa73290716b32f43493e6ec2623e79613d8eb01347676f9d643fe0f45c

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.2-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.2-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2e537d00b6394d5be4598ba6de7aa52c4816a777744a21971815fdd0bbde3947
MD5 6bb187f40a3deb08ccd7156df16356a9
BLAKE2b-256 4ce7978322647c5b93152e1ae2c6d1c16bc7d83d84acefa0ceae1a35428bc8a5

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.2-cp39-abi3-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.2-cp39-abi3-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 423cc9b10a55d7aa5ee9e6f4196512c5bc9b6441b81fa8e62aa2d5f151ea3376
MD5 a47f240cfa802f9bc2f86d032a75a1e7
BLAKE2b-256 cd5d2de9ba29a23ad34049f930cbc99425641a4e0bf6f9630cf1fa5ae1f86f4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page