Skip to main content

Read SAS (sas7bdat), Stata (dta), and SPSS (sav) files with polars

Project description

polars_readstat

Polars plugin for SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files.

The Python package wraps the Rust core in polars_readstat_rs and exposes a Polars-first API. The project includes cross-library parity tests and roundtrip checks to reduce regressions.

The Rust engine is generally faster for many workloads, but performance varies by file shape and options. If you need the legacy C/C++ engine, use version 0.11.1 (see the prior version).

Why use this?

  • In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.
  • It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.
  • API is Polars-first (scan_readstat, read_readstat, write_readstat, write_sas_csv_import).

Install

pip install polars-readstat

Core API

1) Lazy scan

import polars as pl
from polars_readstat import scan_readstat

lf = scan_readstat("/path/file.sas7bdat", preserve_order=True)
df = lf.select(["SERIALNO", "AGEP"]).filter(pl.col("AGEP") >= 18).collect()

2) Getting metadata

from polars_readstat import ScanReadstat

reader = ScanReadstat(path="/path/file.sav")
schema = reader.schema      # polars.Schema
metadata = reader.metadata  # dict with file info and per-column details
lf = reader.df              # LazyFrame — same as calling scan_readstat(path)

metadata is a dict with a columns list. Each column entry includes:

  • "name" — column name
  • "label" — variable label (description), if present
  • "value_labels" — dict mapping coded values to label strings, if present

3) Write (Experimental)

Writing support is experimental and compatibility varies across tools. Stata roundtrip tests are included; SPSS roundtrip coverage is limited. Please report issues.

from polars_readstat import write_readstat, write_sas_csv_import

write_readstat(df, "/path/out.dta")
write_readstat(df, "/path/out.sav")
write_sas_csv_import(df, "/path/out/sas_bundle", dataset_name="my_data")

write_readstat supports Stata (dta) and SPSS (sav).
Use write_sas_csv_import for SAS-ingestible output (.csv + .sas import script). Binary .sas7bdat writing is not currently supported.

Docs

View the docs at https://jrothbaum.github.io/polars_readstat/ for more information on the options you can pass to the scan and write functions.

Benchmark

Benchmarks compare four scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subset of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).

Benchmark context:

  • Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22
  • Storage: external SSD
  • Last run: May 14, 2026 — polars-readstat v0.17.0 vs pandas and pyreadstat
  • Method: wall-clock timings via Python time.time()

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat 0.55
(3.9×)
0.07
(28.4×)
1.46
(2.0×)
0.08
(39.4×)
pandas 2.16 1.99 2.93 3.15
pyreadstat 6.76
(0.3×)
1.64
(1.2×)
7.86
(0.4×)
2.18
(1.4×)

Stata

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat 0.16
(7.3×)
0.10
(11.7×)
0.18
(7.3×)
0.09
(13.8×)
pandas 1.17 1.17 1.31 1.24
pyreadstat 5.48
(0.2×)
4.57
(0.3×)
5.67
(0.2×)
7.69
(0.2×)

SPSS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat 1.09
(62.5×)
0.15
(3.9×)
1.10
(62.4×)
0.15
(3.9×)
pandas 68.12 0.59 68.67 0.59
pyreadstat 3.06
(22.3×)
1.15
(0.5×)
7.09
(9.7×)
1.23
(0.5×)

zsav

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat 3.97
(5.9×)
1.04
(2.1×)
4.77
(4.7×)
1.15
(2.0×)
pandas 23.47 2.20 22.40 2.29

Detailed benchmark notes and dataset descriptions are in BENCHMARKS.md.

Tests run

Test coverage includes:

  • Cross-library comparisons on the pyreadstat and pandas test data, checking results against polars-readstat==0.11.1, pyreadstat, and pandas.
  • Stata/SPSS read/write roundtrip tests.
  • Large-file read/write benchmark runs on real-world data (results below).

If you want to run the same checks locally, helper scripts and tests are in scripts/ and tests/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_readstat-0.18.1-cp39-abi3-win_amd64.whl (20.8 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_readstat-0.18.1-cp39-abi3-manylinux_2_28_x86_64.whl (19.2 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

polars_readstat-0.18.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.2 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_readstat-0.18.1-cp39-abi3-macosx_11_0_arm64.whl (17.0 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_readstat-0.18.1-cp39-abi3-macosx_10_15_x86_64.whl (18.5 MB view details)

Uploaded CPython 3.9+macOS 10.15+ x86-64

File details

Details for the file polars_readstat-0.18.1-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.18.1-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 30fefcfd1897b7661cd414ed42c901a3850b934b5b4e9147033fd3eccdbb422a
MD5 25d8a1204025ed633cca151389b6545f
BLAKE2b-256 283d6a078f1302055db39a3f6ca379b5f139ea1851206d3eb56e2d2c436773ed

See more details on using hashes here.

File details

Details for the file polars_readstat-0.18.1-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.18.1-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4027f97acb70d31b8f4c8128ad334a58a6a20ab9073f16609eb43929431a4dfb
MD5 ceb402f24b5268ca9db7e9f2bc3c3410
BLAKE2b-256 c3211c251edd3d84eb521a71ba55440c0bd11ea1e81815b2ee065e9d073c9179

See more details on using hashes here.

File details

Details for the file polars_readstat-0.18.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.18.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ce8872b86b37facd98147067d57e4b5417bf7d028e2c70de4ab26cc35d666b3e
MD5 12dc2ebea867c58355190fc149cb853b
BLAKE2b-256 4a393e386ab20cc13cc43898dad02ce651027df84489e6b43c894f786f1af217

See more details on using hashes here.

File details

Details for the file polars_readstat-0.18.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.18.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 37dc32f4a54628f854f98743cd4139b7b388bdb1ab9af0f95f0138e6fea5558a
MD5 e037e18eb85550a1323179fed24ac9ee
BLAKE2b-256 fa5cef727437a2885890784bd2678a5d79d4c7d2358361a5f92a9e426731e73d

See more details on using hashes here.

File details

Details for the file polars_readstat-0.18.1-cp39-abi3-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.18.1-cp39-abi3-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 0d5fd6d4906d368a44b13e8aaf3e1049563d60c3411d06d284792fb378f66746
MD5 d9feedd750a2031475878af681bfd727
BLAKE2b-256 2f04afa313e514fcf7c40240af219c1160f2db2c2de82553fd11888195e9909f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page