Skip to main content

Read SAS (sas7bdat), Stata (dta), and SPSS (sav) files with polars

Project description

polars_readstat

Polars plugin for SAS (.sas7bdat), Stata (.dta), and SPSS (.sav/.zsav) files.

The Python package wraps the Rust core in polars_readstat_rs and exposes a simple Polars-first API. I have tried to make sure there are no errors or regressions in this release (tested against 178 test files from pandas, pyreadstat, etc.). If I missed something, you can find info on the readme for the prior version and install v0.11.1 from pypi.

Why use this?

  • In project benchmarks, the new Rust-backed engine is typically faster than pandas/pyreadstat on large SAS/Stata files, especially for subset/filter workloads.
  • It avoids the older C/C++ toolchain complexity and ships as standard Python wheels.
  • API is Polars-first (scan_readstat, read_readstat, write_readstat).

Install

pip install polars-readstat

Core API

1) Lazy scan

import polars as pl
from polars_readstat import scan_readstat

lf = scan_readstat("/path/file.sas7bdat", preserve_order=True)
df = lf.select(["SERIALNO", "AGEP"]).filter(pl.col("AGEP") >= 18).collect()

2) Eager read

from polars_readstat import read_readstat

df = read_readstat("/path/file.dta")

3) Metadata + schema

from polars_readstat import ScanReadstat

reader = ScanReadstat(path="/path/file.sav")
schema = reader.schema
metadata = reader.metadata

4) Write (Stata/SPSS)

from polars_readstat import write_readstat

write_readstat(df, "/path/out.dta", threads=8)
write_readstat(df, "/path/out.sav")

write_readstat supports Stata (dta) and SPSS (sav). SAS writing is not supported.

Tests run

We’ve tried to test this thoroughly:

  • Cross-library comparisons on the pyreadstat and pandas test data, checking results against polars-readstat==0.11.1, pyreadstat, and pandas.
  • Stata/SPSS read/write roundtrip tests.
  • Large-file read/write benchmark runs on real-world data (results below).

If you want to run the same checks locally, helper scripts and tests are in scripts/ and tests/.

Benchmark

For each file, I compared 4 different scenarios: 1) load the full file, 2) load a subset of columns (Subset:True), 3) filter to a subet of rows (Filter: True), 4) load a subset of columns and filter to a subset of rows (Subset:True, Filter: True).

Benchmark context:

  • Machine: AMD Ryzen 7 8845HS (16 cores), 14 GiB RAM, Linux Mint 22
  • Storage: external SSD
  • Last run: August 31, 2025
  • Version tested: polars-readstat 0.12 (new Rust engine) against polars-readstat 0.11.1 (prior C++ and C engines) and pandas and pyreadstat
  • Method: wall-clock timings via Python time.time()

Compared to Pandas and Pyreadstat (using read_file_multiprocessing for parallel processing in Pyreadstat)

SAS

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.90
(1.7×)
0.07
(29.4×)
1.23
(2.5×)
0.07
(29.9×)
polars_readstat
engine="cpp"
(fastest for 0.11.1)
1.31
(1.6×)
0.09
(22.9×)
1.56
(1.9×)
0.09
(23.2×)
pandas 2.07 2.06 3.03 2.09
pyreadstat 10.75
(0.2×)
0.46
(4.5×)
11.93
(0.3×)
0.50
(4.2×)

Stata

all times in seconds (speedup relative to pandas in parenthesis below each)

Library Full File Subset: True Filter: True Subset: True, Filter: True
polars_readstat
New rust engine
0.17
(6.7×)
0.12
(9.8×)
0.24
(4.1×)
0.11
(8.7×)
polars_readstat
engine="readstat"
(the only option for 0.11.1)
1.80
(0.6×)
0.27
(4.4×)
1.31
(0.8×)
0.29
(3.3×)
pandas 1.14 1.18 0.99 0.96
pyreadstat 7.46
(0.2×)
2.18
(0.5×)
7.66
(0.1×)
2.24
(0.4×)

Detailed benchmark notes and dataset descriptions are in BENCHMARKS.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_readstat-0.12.1-cp39-abi3-win_amd64.whl (20.3 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_readstat-0.12.1-cp39-abi3-manylinux_2_28_x86_64.whl (18.8 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.28+ x86-64

polars_readstat-0.12.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.8 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_readstat-0.12.1-cp39-abi3-macosx_11_0_arm64.whl (16.6 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_readstat-0.12.1-cp39-abi3-macosx_10_15_x86_64.whl (18.2 MB view details)

Uploaded CPython 3.9+macOS 10.15+ x86-64

File details

Details for the file polars_readstat-0.12.1-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.1-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 9c0349be2dceaeb4f6eece318e1788c863c64720988e59c8c96d1ec0a84d49d0
MD5 54b63c23912dd713795fdf0a6c9836a1
BLAKE2b-256 02d7c5520e60c41dad8e628f217ef986a30665b100e2b5d9d787c926560d69ab

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.1-cp39-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.1-cp39-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 edc4a71c32f0681b245660a932491ba01c2f8512391cbc7690d598cf812665cd
MD5 c9cc05c0a96898d3cb14536818a59a99
BLAKE2b-256 e433270b1f883e6821433d8499ea5d48de0ac298b84e095d7f87dc9aeabe210a

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 086ff55cbff4740ba079477fc3e6f6ded93fd2b8de6f135bbac7fa64e8913485
MD5 d6dfa82e4915446b61e292d40a23d759
BLAKE2b-256 f52fbf1e63c85d751376d2106f8257f34b17a8c41eb6fd6dd78872340ef95988

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e9977574f1c5940a25dd7ba8d061f8ab9cb5360cc7c42a8ce1ce6fa8e2013fcd
MD5 3df0b7317c07ecc7ca527be7181b9fdd
BLAKE2b-256 edab3b79118b660f14d1cf8f0e8c1421421138c35dd07b80992553e7e72b7e65

See more details on using hashes here.

File details

Details for the file polars_readstat-0.12.1-cp39-abi3-macosx_10_15_x86_64.whl.

File metadata

File hashes

Hashes for polars_readstat-0.12.1-cp39-abi3-macosx_10_15_x86_64.whl
Algorithm Hash digest
SHA256 ac4896257b5fa904c5d2471eb148f32dea83a31ab0dd6962b73dcf873215eff8
MD5 ebb52fe84eb17dade94b3afb00799016
BLAKE2b-256 b1f9d5e23d6d29fc8530b092a4b1d4169fc969998068fe8ecaad56f9e0e17d94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page