Skip to main content

Read SAS (sas7bdat), Stata (dta), and SPSS (sav) files with polars

Project description

polars_readstat

Polars plugin for SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files.

The Python package wraps the Rust core in polars_readstat_rs and exposes a Polars-first API. The project includes cross-library parity tests and roundtrip checks to reduce regressions.

The Rust engine is generally faster for many workloads, but performance varies by file shape and options. If you need the legacy C/C++ engine, use version 0.11.1 (see the prior version).

Why use this?

  • In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.
  • It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.
  • API is Polars-first (scan_readstat, read_readstat, write_readstat, write_sas_csv_import).

Install

pip install polars-readstat

Core API

1) Lazy scan

import polars as pl
from polars_readstat import scan_readstat

lf = scan_readstat("/path/file.sas7bdat", preserve_order=True)
df = lf.select(["SERIALNO", "AGEP"]).filter(pl.col("AGEP") >= 18).collect()

2) Getting metadata

from polars_readstat import ScanReadstat

reader = ScanReadstat(path="/path/file.sav")
schema = reader.schema      # polars.Schema
metadata = reader.metadata  # dict with file info and per-column details
lf = reader.df              # LazyFrame — same as calling scan_readstat(path)

metadata is a dict with a columns list. Each column entry includes:

  • "name" — column name
  • "label" — variable label (description), if present
  • "value_labels" — dict mapping coded values to label strings, if present

3) Write (Experimental)

Writing support is experimental and compatibility varies across tools. Stata roundtrip tests are included; SPSS roundtrip coverage is limited. Please report issues.

from polars_readstat import write_readstat, write_sas_csv_import

write_readstat(df, "/path/out.dta")
write_readstat(df, "/path/out.sav")
write_sas_csv_import(df, "/path/out/sas_bundle", dataset_name="my_data")

write_readstat supports Stata (dta) and SPSS (sav).
Use write_sas_csv_import for SAS-ingestible output (.csv + .sas import script). Binary .sas7bdat writing is not currently supported.

Docs

View the docs at https://jrothbaum.github.io/polars_readstat/ for more information on the options you can pass to the scan and write functions.

Benchmark

Benchmarks compare four scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subset of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).

Benchmark context:

  • Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22
  • Storage: external SSD
  • polars-readstat (rust engine v0.12.4) last run: February 24, 2026; comparison library timings for SAS/Stata (v0.11.1) last run August 31, 2025
  • Version tested: polars-readstat 0.12.4 (new Rust engine) against polars-readstat 0.11.1 (prior C++ and C engines) and pandas and pyreadstat
  • Method: wall-clock timings via Python time.time()

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.72
(2.9×)
0.04
(51.5×)
1.04
(2.9×)
0.04
(52.5×)
polars_readstat
engine="cpp"
(fastest for 0.11.1)
1.31
(1.6×)
0.09
(22.9×)
1.56
(1.9×)
0.09
(23.2×)
pandas 2.07 2.06 3.03 2.09
pyreadstat 10.75
(0.2×)
0.46
(4.5×)
11.93
(0.3×)
0.50
(4.2×)

Stata

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.17
(6.7×)
0.12
(9.8×)
0.24
(4.1×)
0.11
(8.7×)
polars_readstat
engine="readstat"
(the only option for 0.11.1)
1.80
(0.6×)
0.27
(4.4×)
1.31
(0.8×)
0.29
(3.3×)
pandas 1.14 1.18 0.99 0.96
pyreadstat 7.46
(0.2×)
2.18
(0.5×)
7.66
(0.1×)
2.24
(0.4×)

SPSS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.22
(6.6×)
0.15
(9.1×)
0.25
(6.0×)
0.26
(4.5×)
pandas 1.46 1.36 1.49 1.16

Detailed benchmark notes and dataset descriptions are in BENCHMARKS.md.

Tests run

Test coverage includes:

  • Cross-library comparisons on the pyreadstat and pandas test data, checking results against polars-readstat==0.11.1, pyreadstat, and pandas.
  • Stata/SPSS read/write roundtrip tests.
  • Large-file read/write benchmark runs on real-world data (results below).

If you want to run the same checks locally, helper scripts and tests are in scripts/ and tests/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_readstat-0.16.1-cp39-abi3-win_amd64.whl (20.6 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_readstat-0.16.1-cp39-abi3-manylinux_2_28_x86_64.whl (19.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

polars_readstat-0.16.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_readstat-0.16.1-cp39-abi3-macosx_11_0_arm64.whl (16.8 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_readstat-0.16.1-cp39-abi3-macosx_10_15_x86_64.whl (18.4 MB view details)

Uploaded CPython 3.9+macOS 10.15+ x86-64

File details

Details for the file polars_readstat-0.16.1-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.16.1-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 045bbc44889aabccc12136a3d8fe4ba0b13bfaf78e27885a7f376dc36fcbf82d
MD5 5aacace11bf60eeee37282e41b393134
BLAKE2b-256 383f8053aa00327cd2193dd0a5124c18b8f2ac47069690dccd5fd3b5d201d609

See more details on using hashes here.

File details

Details for the file polars_readstat-0.16.1-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.16.1-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1a8527e45bf7a88b687c73ae6fc8c9eb37fccd8c83f2b3739faab4263801a4ac
MD5 c2673b67eb61f854b397d93bca33877c
BLAKE2b-256 13776800c4218925bb5ba1aebfb6741aae713b297fe0c20566407149e5595438

See more details on using hashes here.

File details

Details for the file polars_readstat-0.16.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.16.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8bc90dd039aaacb40a9a2709d88d9db6d9385803fb2ee0a6d3547abcb5a3a671
MD5 2fb3d785ea654345d3bd28330dc85492
BLAKE2b-256 497dd1a014770d60194cb59d15669b9c757f199f8ad4e9209c7995501c198464

See more details on using hashes here.

File details

Details for the file polars_readstat-0.16.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.16.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 78d4402aebe785b9666761efac0164ba010d54423fbf205abb6e80f0f39cb958
MD5 80c5b02c1d5d0b992e88d93dde240193
BLAKE2b-256 6d1d43c573c8a15d6304a9fef8f7fc93a796fdb71423f00995c7fc562eefe239

See more details on using hashes here.

File details

Details for the file polars_readstat-0.16.1-cp39-abi3-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.16.1-cp39-abi3-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 5ea63a68ae327e840cfb9cceadae48b6a49fe2e67d475357b325a0e5b3de1637
MD5 e1ddd8321062c6f3a62240dbdb6890f7
BLAKE2b-256 933b8747d8fb013a05c61d3b5581645a556b63fc414794f9f65b87c6ea80afc6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page