Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader and writer — Arrow-native, zero C dependencies.

Features

  • Blazing fast read and write for SPSS .sav (bytecode) and .zsav (zlib) files
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels, and more
  • Lazy reader via scan_sav() — Polars LazyFrame with projection and row limit pushdown
  • Pure Rust with a native Python API — native Arrow integration, no C dependencies
  • Benchmarked up to 3–10x faster reads and 4–20x faster writes compared to current popular SPSS I/O libraries

Installation

Python:

uv add ambers

Rust:

cargo add ambers

Python

import ambers as am
import polars as pl

# Eager read — data + metadata
df, meta = am.read_sav("survey.sav")

# Lazy read — returns Polars LazyFrame
lf, meta = am.scan_sav("survey.sav")
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_metadata("survey.sav")

# Write back — roundtrip with full metadata
df = df.filter(pl.col("age") > 18)
am.write_sav(df, "filtered.sav", meta=meta)

# Write as .zsav (zlib compressed)
am.write_sav(df, "compressed.zsav", meta=meta)

# From scratch — metadata is optional, inferred from DataFrame schema
am.write_sav(df, "new.sav")

Use .sav for bytecode compression (default), .zsav for zlib compression. Pass meta= to preserve all metadata from a prior read_sav(), or omit it to infer formats from the DataFrame. Individual writable fields (e.g., variable_labels, variable_value_labels) can also be passed directly as keyword arguments for fine-grained control.

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Metadata Fields

All fields returned by the reader. Fields marked Write are preserved when passed via meta= to write_sav(). Read-only fields are set automatically (encoding, timestamps, row/column counts, etc.).

Note: This is a first pass — field names and behavior may change without warning in future releases.

Field Read Write Type
variable_names yes yes list[str]
variable_labels yes yes dict[str, str]
variable_value_labels yes yes dict[str, dict[float|str, str]]
variable_measure yes yes dict[str, str]
variable_alignment yes yes dict[str, str]
variable_display_width yes yes dict[str, int]
variable_storage_width yes yes dict[str, int]
variable_missing yes yes dict[str, list[dict]]
spss_variable_types yes yes dict[str, str]
rust_variable_types yes dict[str, str]
weight_variable yes yes str | None
mr_sets yes yes dict[str, dict]
file_label yes yes str
file_format yes str
file_encoding yes str
creation_time yes str
modification_time yes str
number_rows yes int | None
number_columns yes int
compression yes str
notes yes yes list[str]

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Best of 3–5 runs (with warmup) on Windows 11, Python 3.13, 24-core machine.

File Size Rows Cols ambers polars_readstat pyreadstat vs prs vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 < 0.01s < 0.01s 0.011s
test_2 (bytecode) 147 MB 22,070 677 0.286s 0.897s 3.524s 3.1x 12x
test_3 (uncompressed) 1.1 GB 79,066 915 0.322s 1.150s 4.918s 3.6x 15x
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.003s 0.012s 1.5x 6x
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.003s 0.016s 1.5x 8x
test_6 (uncompressed) 5.4 GB 395,330 916 1.600s 1.752s 25.214s 1.1x 16x
  • Faster than polars_readstat on all tested files — 1.1–3.6x faster
  • 6–16x faster than pyreadstat across all file sizes
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

Write

write_sav() writes a Polars DataFrame + metadata back to .sav (bytecode) or .zsav (zlib). Best of 5 runs on the same machine.

File Size Rows Cols Mode ambers pyreadstat Speedup
test_1 (bytecode) 0.2 MB 1,500 75 .sav 0.001s 0.019s 13x
.zsav 0.004s 0.026s 7x
test_2 (bytecode) 147 MB 22,070 677 .sav 0.567s 3.849s 7x
.zsav 1.088s 4.415s 4x
test_3 (uncompressed) 1.1 GB 79,066 915 .sav 0.950s 16.152s 17x
.zsav 1.774s 17.362s 10x
test_6 (uncompressed) 5.4 GB 395,330 916 .sav 5.700s 79.999s 14x
.zsav 8.193s 85.491s 10x
  • 4–20x faster than pyreadstat on writes across all files and compression modes
  • Full metadata roundtrip: variable labels, value labels, missing values, MR sets, display properties
  • Bytecode (.sav) and zlib (.zsav) compression

Roadmap

  • Continued I/O performance optimization
  • Expanded SPSS metadata field coverage
  • Rich metadata manipulation — add, update, merge, and remove metadata programmatically
  • Individual metadata field overrides in write_sav() — pass variable_labels=, variable_value_labels=, etc. alongside meta= to selectively override fields
  • Currently supports read and write with Polars DataFrames (eager and lazy) — extending to pandas, Narwhals, DuckDB, and others

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.3.0.tar.gz (100.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.3.0-cp314-cp314-win_amd64.whl (938.0 kB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.3.0-cp314-cp314-manylinux_2_34_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.3.0-cp314-cp314-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.3.0-cp313-cp313-win_amd64.whl (941.0 kB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.3.0-cp313-cp313-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.3.0-cp312-cp312-win_amd64.whl (941.9 kB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.3.0-cp312-cp312-macosx_11_0_arm64.whl (1.0 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.3.0.tar.gz.

File metadata

  • Download URL: ambers-0.3.0.tar.gz
  • Upload date:
  • Size: 100.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ecb9e7e1c8d00aefa62e84584cb43c87f4dcf944c4eadc8cb8e203e46eeffc59
MD5 e62bc6e89c97ad6d9e877598a7925cbc
BLAKE2b-256 6151c16e4c2686b54f9ee1cd4bc6f0a05a56164c0e8cfc49855e477d31a21cc1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.0.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.0-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.0-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 938.0 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 966c103dfef5c58e27c732406a4a56749c38400c63be163ae59c7d4c49c03f46
MD5 56ed038a5d7d2ee8ef665ae9669fb7d2
BLAKE2b-256 a6db84790a3700cdcb0801d3606c9600065afc3a2883c4b0ec479367e7c97ba8

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.0-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.0-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.0-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6b1b136bd331c4bbe8819db8da7615bf499dc10a5c2f0d658eeec3f21cb54674
MD5 eeef9a51af01fe6934fcf1aa9e0ad0d2
BLAKE2b-256 c947b3970f7aa45ab86437f1550cfbb3b6c86313699d58cd59df05e768b687f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.0-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8bbb54d2eac30bf2594bea4dc1250e69795a1401512021e5e1a84d7bc78af986
MD5 fa161d296ff43583eb17be5c2331ca22
BLAKE2b-256 c3f634496b5bf08f60fe2cfdfb3768312cdc33e7a90ec84ee586e41c438219e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.0-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 941.0 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 fe2017c035147f3ff7ce3707fabf231167a8551849913dac0c6f9eeb0a0d0354
MD5 3da45183a0c908bda9afcd33acb0f17d
BLAKE2b-256 036d2cfb4c796a3fd500ed27a8d6f5758d0c45f05e8f5073dbe92cf2d707af52

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.0-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 7580bff363a0e5cf77fde2095276b24d45ed6fe2ba2de39166c0c21798e75eed
MD5 a3674ff97a6443dc502c19d0a1a7363c
BLAKE2b-256 c74dd6c27557e1bf16e09a37d5cb2dcd23547d5ebcd0ab5cab528dc5b46b28bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bc5f45181da57fe0f78e43359300c31dc9fa3c07d640975bd971fa08e5f2aaff
MD5 c6b5528a3cf0e491ee4980efea58f459
BLAKE2b-256 be7626aecc64c3456b5a070e2ca76799dc6caeb747ad5de767f14f10c30b5c00

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 941.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 88dcfe8d797422f0876325b0f3087ca6c292d16af9101de9a8dfa14f14b740fa
MD5 2d9a0a4c7aed966bfa96f76040c793d4
BLAKE2b-256 10a5696e5b18f14fa243d5c4a68e6a1191bca87e251db389ce71fa337920e968

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.0-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 91bf656dc525ae4f66699426070c36defb1c2c2fdcfe7c76a3c6af83ddd74178
MD5 cc65652da1c755f44bb6bd2d3a6cc171
BLAKE2b-256 763d4e952ff81686594a9564c2c785a6b9ec67d1982f27858ab72d8b8182f641

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9602797728a97feeb43cac13fc4d137fbb1c5e5d12bc06017a48d4417de04cce
MD5 6170cb915762426728dfe7e69d2d8dd0
BLAKE2b-256 c82f3710c0c4678bfa910ead3a7ef72f999b440c4a18bd2c9749c1eb60b60dfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page