Skip to main content

Read SAS (sas7bdat), Stata (dta), and SPSS (sav) files with polars

Project description

polars_readstat

Polars plugin for SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files.

The Python package wraps the Rust core in polars_readstat_rs and exposes a Polars-first API. The project includes cross-library parity tests and roundtrip checks to reduce regressions.

The Rust engine is generally faster for many workloads, but performance varies by file shape and options. If you need the legacy C/C++ engine, use version 0.11.1 (see the prior version).

Why use this?

  • In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.
  • It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.
  • API is Polars-first (scan_readstat, read_readstat, write_readstat, write_sas_csv_import).

Install

pip install polars-readstat

Core API

1) Lazy scan

import polars as pl
from polars_readstat import scan_readstat

lf = scan_readstat("/path/file.sas7bdat", preserve_order=True)
df = lf.select(["SERIALNO", "AGEP"]).filter(pl.col("AGEP") >= 18).collect()

2) Getting metadata

from polars_readstat import ScanReadstat

reader = ScanReadstat(path="/path/file.sav")
schema = reader.schema      # polars.Schema
metadata = reader.metadata  # dict with file info and per-column details
lf = reader.df              # LazyFrame — same as calling scan_readstat(path)

metadata is a dict with a columns list. Each column entry includes:

  • "name" — column name
  • "label" — variable label (description), if present
  • "value_labels" — dict mapping coded values to label strings, if present

3) Write (Experimental)

Writing support is experimental and compatibility varies across tools. Stata roundtrip tests are included; SPSS roundtrip coverage is limited. Please report issues.

from polars_readstat import write_readstat, write_sas_csv_import

write_readstat(df, "/path/out.dta")
write_readstat(df, "/path/out.sav")
write_sas_csv_import(df, "/path/out/sas_bundle", dataset_name="my_data")

write_readstat supports Stata (dta) and SPSS (sav).
Use write_sas_csv_import for SAS-ingestible output (.csv + .sas import script). Binary .sas7bdat writing is not currently supported.

Docs

View the docs at https://jrothbaum.github.io/polars_readstat/ for more information on the options you can pass to the scan and write functions.

Benchmark

Benchmarks compare four scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subset of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).

Benchmark context:

  • Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22
  • Storage: external SSD
  • polars-readstat (rust engine v0.12.4) last run: February 24, 2026; comparison library timings for SAS/Stata (v0.11.1) last run August 31, 2025
  • Version tested: polars-readstat 0.12.4 (new Rust engine) against polars-readstat 0.11.1 (prior C++ and C engines) and pandas and pyreadstat
  • Method: wall-clock timings via Python time.time()

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.72
(2.9×)
0.04
(51.5×)
1.04
(2.9×)
0.04
(52.5×)
polars_readstat
engine="cpp"
(fastest for 0.11.1)
1.31
(1.6×)
0.09
(22.9×)
1.56
(1.9×)
0.09
(23.2×)
pandas 2.07 2.06 3.03 2.09
pyreadstat 10.75
(0.2×)
0.46
(4.5×)
11.93
(0.3×)
0.50
(4.2×)

Stata

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.17
(6.7×)
0.12
(9.8×)
0.24
(4.1×)
0.11
(8.7×)
polars_readstat
engine="readstat"
(the only option for 0.11.1)
1.80
(0.6×)
0.27
(4.4×)
1.31
(0.8×)
0.29
(3.3×)
pandas 1.14 1.18 0.99 0.96
pyreadstat 7.46
(0.2×)
2.18
(0.5×)
7.66
(0.1×)
2.24
(0.4×)

SPSS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.22
(6.6×)
0.15
(9.1×)
0.25
(6.0×)
0.26
(4.5×)
pandas 1.46 1.36 1.49 1.16

Detailed benchmark notes and dataset descriptions are in BENCHMARKS.md.

Tests run

Test coverage includes:

  • Cross-library comparisons on the pyreadstat and pandas test data, checking results against polars-readstat==0.11.1, pyreadstat, and pandas.
  • Stata/SPSS read/write roundtrip tests.
  • Large-file read/write benchmark runs on real-world data (results below).

If you want to run the same checks locally, helper scripts and tests are in scripts/ and tests/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_readstat-0.14.4-cp39-abi3-win_amd64.whl (20.6 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_readstat-0.14.4-cp39-abi3-manylinux_2_28_x86_64.whl (19.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

polars_readstat-0.14.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_readstat-0.14.4-cp39-abi3-macosx_11_0_arm64.whl (16.8 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_readstat-0.14.4-cp39-abi3-macosx_10_15_x86_64.whl (18.4 MB view details)

Uploaded CPython 3.9+macOS 10.15+ x86-64

File details

Details for the file polars_readstat-0.14.4-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.4-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 0e057a5cfc0e2a9ec6408dbe64b524be2a79e2c9ba4ba5baf10275b8c1cdb7cd
MD5 e6cfec6aaae2fd007cc931eea3bf5b71
BLAKE2b-256 e2067181f7a8cae6532c8278dfcf5f77fffab9e3b8887bf6003e67388cf6c326

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.4-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.4-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cb2084e2a87147139c5999495eca9a987c202a04bbea85ac90c4191b3f9ecdbf
MD5 585c3479875813b603fc1e59740ca1db
BLAKE2b-256 b44aa83f9b4e6084362757c3190d542171b617be7181d3ee94ce05a36512b995

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 da54f17f4e0d1c9750e76b7beb899ebb9316f5ae44b28de05a4b7cbe77513275
MD5 cfc14f5f287c3a50a2b40afe3e737a78
BLAKE2b-256 09b5e3cce3ba7218ee8b1098937c57d92e0a33aa5fa3e97cedc2e222d91f1177

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.4-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.4-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5d10d974806ec47015505fc8b40af1421f6efb47cfcd288c4de1de85d5efce5d
MD5 fbed3a84eb7e2533008edd1ce0a218f8
BLAKE2b-256 05fc45613670fa385d9f230bb1ae6017105ec492df96307dc3337ff30af20b37

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.4-cp39-abi3-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.4-cp39-abi3-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 555bd4b51825088318c0355e0da8abe010e0b2d77096d94b352b362525ee2c9f
MD5 97c761a45fde13bfe5b357250a708f93
BLAKE2b-256 2e00c5fc53ec72ca7515a627c7ded57a7ec261e27106de1d0cf2ebc4bb05b689

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page