Skip to main content

Read SAS (sas7bdat), Stata (dta), and SPSS (sav) files with polars

Project description

polars_readstat

Polars plugin for SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files.

The Python package wraps the Rust core in polars_readstat_rs and exposes a Polars-first API. The project includes cross-library parity tests and roundtrip checks to reduce regressions.

The Rust engine is generally faster for many workloads, but performance varies by file shape and options. If you need the legacy C/C++ engine, use version 0.11.1 (see the prior version).

Why use this?

  • In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.
  • It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.
  • API is Polars-first (scan_readstat, read_readstat, write_readstat, write_sas_csv_import).

Install

pip install polars-readstat

Core API

1) Lazy scan

import polars as pl
from polars_readstat import scan_readstat

lf = scan_readstat("/path/file.sas7bdat", preserve_order=True)
df = lf.select(["SERIALNO", "AGEP"]).filter(pl.col("AGEP") >= 18).collect()

2) Getting metadata

from polars_readstat import ScanReadstat

reader = ScanReadstat(path="/path/file.sav")
schema = reader.schema      # polars.Schema
metadata = reader.metadata  # dict with file info and per-column details
lf = reader.df              # LazyFrame — same as calling scan_readstat(path)

metadata is a dict with a columns list. Each column entry includes:

  • "name" — column name
  • "label" — variable label (description), if present
  • "value_labels" — dict mapping coded values to label strings, if present

3) Write (Experimental)

Writing support is experimental and compatibility varies across tools. Stata roundtrip tests are included; SPSS roundtrip coverage is limited. Please report issues.

from polars_readstat import write_readstat, write_sas_csv_import

write_readstat(df, "/path/out.dta")
write_readstat(df, "/path/out.sav")
write_sas_csv_import(df, "/path/out/sas_bundle", dataset_name="my_data")

write_readstat supports Stata (dta) and SPSS (sav).
Use write_sas_csv_import for SAS-ingestible output (.csv + .sas import script). Binary .sas7bdat writing is not currently supported.

Docs

View the docs at https://jrothbaum.github.io/polars_readstat/ for more information on the options you can pass to the scan and write functions.

Benchmark

Benchmarks compare four scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subset of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).

Benchmark context:

  • Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22
  • Storage: external SSD
  • polars-readstat (rust engine v0.12.4) last run: February 24, 2026; comparison library timings for SAS/Stata (v0.11.1) last run August 31, 2025
  • Version tested: polars-readstat 0.12.4 (new Rust engine) against polars-readstat 0.11.1 (prior C++ and C engines) and pandas and pyreadstat
  • Method: wall-clock timings via Python time.time()

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.72
(2.9×)
0.04
(51.5×)
1.04
(2.9×)
0.04
(52.5×)
polars_readstat
engine="cpp"
(fastest for 0.11.1)
1.31
(1.6×)
0.09
(22.9×)
1.56
(1.9×)
0.09
(23.2×)
pandas 2.07 2.06 3.03 2.09
pyreadstat 10.75
(0.2×)
0.46
(4.5×)
11.93
(0.3×)
0.50
(4.2×)

Stata

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.17
(6.7×)
0.12
(9.8×)
0.24
(4.1×)
0.11
(8.7×)
polars_readstat
engine="readstat"
(the only option for 0.11.1)
1.80
(0.6×)
0.27
(4.4×)
1.31
(0.8×)
0.29
(3.3×)
pandas 1.14 1.18 0.99 0.96
pyreadstat 7.46
(0.2×)
2.18
(0.5×)
7.66
(0.1×)
2.24
(0.4×)

SPSS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.22
(6.6×)
0.15
(9.1×)
0.25
(6.0×)
0.26
(4.5×)
pandas 1.46 1.36 1.49 1.16

Detailed benchmark notes and dataset descriptions are in BENCHMARKS.md.

Tests run

Test coverage includes:

  • Cross-library comparisons on the pyreadstat and pandas test data, checking results against polars-readstat==0.11.1, pyreadstat, and pandas.
  • Stata/SPSS read/write roundtrip tests.
  • Large-file read/write benchmark runs on real-world data (results below).

If you want to run the same checks locally, helper scripts and tests are in scripts/ and tests/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_readstat-0.14.6-cp39-abi3-win_amd64.whl (20.6 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_readstat-0.14.6-cp39-abi3-manylinux_2_28_x86_64.whl (19.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

polars_readstat-0.14.6-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_readstat-0.14.6-cp39-abi3-macosx_11_0_arm64.whl (16.8 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_readstat-0.14.6-cp39-abi3-macosx_10_15_x86_64.whl (18.4 MB view details)

Uploaded CPython 3.9+macOS 10.15+ x86-64

File details

Details for the file polars_readstat-0.14.6-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.6-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 3cd7f05f8c41d3189885ec675ae6131d2d870cd6affc1e03934173a58557f161
MD5 0dace2f9f75ada50a6c74cc400181704
BLAKE2b-256 43b8d4fa05b0690296b43300cb76a03e6a1fd8e2cec4c2d0df32e5fa2d14c73e

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.6-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.6-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 006b9d58207ce47d8e2687652843b44904882ecae8726ea63df962ad21964c3f
MD5 f91849e29f4d326cd6d7c3dbd5d4c34f
BLAKE2b-256 8e0de97726de86a7ee7b7fd7b83ea20f3cb77a6ae973dde3ff82f58e64e7527e

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.6-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.6-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d70612ef0000e4de7caf616473186a64d473b90a20ae71bab35682a76e58f11a
MD5 d4a389a4be5c245bc40968dd1e1144b6
BLAKE2b-256 6febda3eb3a0d05abfd929decdcfb3f312c808ee652ba4a33a28888d31acbb20

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.6-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.6-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 473cd950340877dfce966161e5b02d0278ded760540720b56ed1b45213e55ee1
MD5 fe6a3af7d845789380ec53fc3d986a21
BLAKE2b-256 df00e06627edf2bcc48814ef07046fdef88cb03546346300a2c4bc4acdf2676a

See more details on using hashes here.

File details

Details for the file polars_readstat-0.14.6-cp39-abi3-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.14.6-cp39-abi3-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 7fddc823bc1a30face61739e29b0094edba127ceff3f6f046f601f55ba1ced97
MD5 46665ddf2566cf4769d21fad70b82428
BLAKE2b-256 ef7010a099e0fd08a206fb9e24ee0757477a08ea7ffd07fdfde97ef2ee0b876a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page