Skip to main content

Read SAS (sas7bdat), Stata (dta), and SPSS (sav) files with polars

Project description

polars_readstat

Polars plugin for SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files.

The Python package wraps the Rust core in polars_readstat_rs and exposes a Polars-first API. The project includes cross-library parity tests and roundtrip checks to reduce regressions.

The Rust engine is generally faster for many workloads, but performance varies by file shape and options. If you need the legacy C/C++ engine, use version 0.11.1 (see the prior version).

Why use this?

  • In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.
  • It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.
  • API is Polars-first (scan_readstat, read_readstat, write_readstat).

Install

pip install polars-readstat

Core API

1) Lazy scan

import polars as pl
from polars_readstat import scan_readstat

lf = scan_readstat("/path/file.sas7bdat", preserve_order=True)
df = lf.select(["SERIALNO", "AGEP"]).filter(pl.col("AGEP") >= 18).collect()

2) Getting metadata

from polars_readstat import ScanReadstat

reader = ScanReadstat(path="/path/file.sav")
schema = reader.schema      # polars.Schema
metadata = reader.metadata  # dict with file info and per-column details
lf = reader.df              # LazyFrame — same as calling scan_readstat(path)

metadata is a dict with a columns list. Each column entry includes:

  • "name" — column name
  • "label" — variable label (description), if present
  • "value_labels" — dict mapping coded values to label strings, if present

3) Write (Stata/SPSS) - EXPERIMENTAL

Writing support is experimental and compatibility varies across tools. Stata roundtrip tests are included; SPSS roundtrip coverage is limited. Please report issues.

from polars_readstat import write_readstat

write_readstat(df, "/path/out.dta")
write_readstat(df, "/path/out.sav")

write_readstat supports Stata (dta) and SPSS (sav). SAS writing is not supported.

Docs

View the docs at https://jrothbaum.github.io/polars_readstat/ for more information on the options you can pass to the scan and write functions.

Benchmark

Benchmarks compare four scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subset of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).

Benchmark context:

  • Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22
  • Storage: external SSD
  • polars-readstat (rust engine v0.12.4) last run: February 24, 2026; comparison library timings for SAS/Stata (v0.11.1) last run August 31, 2025
  • Version tested: polars-readstat 0.12.4 (new Rust engine) against polars-readstat 0.11.1 (prior C++ and C engines) and pandas and pyreadstat
  • Method: wall-clock timings via Python time.time()

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.72
(2.9×)
0.04
(51.5×)
1.04
(2.9×)
0.04
(52.5×)
polars_readstat
engine="cpp"
(fastest for 0.11.1)
1.31
(1.6×)
0.09
(22.9×)
1.56
(1.9×)
0.09
(23.2×)
pandas 2.07 2.06 3.03 2.09
pyreadstat 10.75
(0.2×)
0.46
(4.5×)
11.93
(0.3×)
0.50
(4.2×)

Stata

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.17
(6.7×)
0.12
(9.8×)
0.24
(4.1×)
0.11
(8.7×)
polars_readstat
engine="readstat"
(the only option for 0.11.1)
1.80
(0.6×)
0.27
(4.4×)
1.31
(0.8×)
0.29
(3.3×)
pandas 1.14 1.18 0.99 0.96
pyreadstat 7.46
(0.2×)
2.18
(0.5×)
7.66
(0.1×)
2.24
(0.4×)

SPSS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.22
(6.6×)
0.15
(9.1×)
0.25
(6.0×)
0.26
(4.5×)
pandas 1.46 1.36 1.49 1.16
pyreadstat 9.25
(0.2×)
4.85
(0.3×)
9.39
(0.2×)
4.75
(0.2×)

Detailed benchmark notes and dataset descriptions are in BENCHMARKS.md.

Tests run

Test coverage includes:

  • Cross-library comparisons on the pyreadstat and pandas test data, checking results against polars-readstat==0.11.1, pyreadstat, and pandas.
  • Stata/SPSS read/write roundtrip tests.
  • Large-file read/write benchmark runs on real-world data (results below).

If you want to run the same checks locally, helper scripts and tests are in scripts/ and tests/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_readstat-0.14.0-cp39-abi3-win_amd64.whl (20.6 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_readstat-0.14.0-cp39-abi3-manylinux_2_28_x86_64.whl (19.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

polars_readstat-0.14.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_readstat-0.14.0-cp39-abi3-macosx_11_0_arm64.whl (16.8 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_readstat-0.14.0-cp39-abi3-macosx_10_15_x86_64.whl (18.4 MB view details)

Uploaded CPython 3.9+macOS 10.15+ x86-64

File details

Details for the file polars_readstat-0.14.0-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 96a87e99f3bc0d1f2d646049714281fe3a91d53c7250b7aa86c522be8e795a78
MD5 212d989a5b2b201ad48071aa00546a7f
BLAKE2b-256 2750555305d9aa8beaa820ca813a8f49f549146ae1b3fc0755e2e05771b1f2b8

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.0-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.0-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f6a72370570e34f0a062fda3860515313112ce738929f877e7c657474802d821
MD5 2f6119d63b91d0bb0e9736f18b1299fb
BLAKE2b-256 a50a3d7628a18a30d590b8f2c9994e905976821b5ffe611f2dec4e7c1651aa50

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b1f69d05f68aa77c272a224779fe56bd7567231e058a12b97a543461bd850c85
MD5 6dcde0308654a43415e343ecd3857e2b
BLAKE2b-256 a497641a7e075e78b34aecd9a791f5cd271d247e3bf4e8c5ad80ccc5f1c9cfbb

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 731254d3e253ed2e3be5bc8c77d542c80264c98167fadf0a0d332100d0a62cea
MD5 f2908f8d1f82ad728d3d589fecd1c9ae
BLAKE2b-256 4a7093dad8894f979fd0664420f1d9eef4a2942315e6775e8baf096d0df76578

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.0-cp39-abi3-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.0-cp39-abi3-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 eb5096c4a97ff8d9fce719ae775c18633e2260a2e08d4b9a52799755a2de99ed
MD5 fd179351538bec298a1e022eed1ff1da
BLAKE2b-256 015daef0875102cdf695a433433a4dfce54d797bbea4917d0184a6224b86677f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page