Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader and writer — Arrow-native, zero C dependencies.

Features

  • Blazing fast read and write for SPSS .sav (bytecode) and .zsav (zlib) files
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels, and more
  • Lazy reader via scan_sav() — Polars LazyFrame with projection and row limit pushdown
  • Pure Rust with a native Python API — native Arrow integration, no C dependencies
  • Benchmarked up to 3–10x faster reads and 4–20x faster writes compared to current popular SPSS I/O libraries

Installation

Python:

uv add ambers

Rust:

cargo add ambers

Python

import ambers as am
import polars as pl

# Eager read — data + metadata
df, meta = am.read_sav("survey.sav")

# Lazy read — returns Polars LazyFrame
lf, meta = am.scan_sav("survey.sav")
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_metadata("survey.sav")

# Write back — roundtrip with full metadata
df = df.filter(pl.col("age") > 18)
am.write_sav(df, "filtered.sav", meta=meta)                        # bytecode (default for .sav)
am.write_sav(df, "compressed.zsav", meta=meta)                     # zlib (default for .zsav)
am.write_sav(df, "raw.sav", meta=meta, compression="uncompressed") # no compression
am.write_sav(df, "fast.zsav", meta=meta, compression_level=1)      # fast zlib

# From scratch — metadata is optional, inferred from DataFrame schema
am.write_sav(df, "new.sav")

.sav uses bytecode compression by default, .zsav uses zlib. Pass compression= to override ("uncompressed", "bytecode", "zlib"). Pass meta= to preserve all metadata from a prior read_sav(), or omit it to infer formats from the DataFrame.

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.role("Q1") Variable role ("input", "target", "both", "none", "partition", "split")
meta.attribute("Q1", "CustomNote") Custom attribute values (list[str] or None)
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Metadata Fields

All fields returned by the reader. Fields marked Write are preserved when passed via meta= to write_sav(). Read-only fields are set automatically (encoding, timestamps, row/column counts, etc.).

Note: This is a first pass — field names and behavior may change without warning in future releases.

Field Read Write Type
file_label yes yes str
file_format yes str
file_encoding yes str
creation_time yes str
compression yes str
number_columns yes int
number_rows yes int | None
weight_variable yes yes str | None
notes yes yes list[str]
variable_names yes list[str]
variable_labels yes yes dict[str, str]
variable_value_labels yes yes dict[str, dict[float|str, str]]
variable_formats yes yes dict[str, str]
variable_measures yes yes dict[str, str]
variable_alignments yes yes dict[str, str]
variable_storage_widths yes dict[str, int]
variable_display_widths yes yes dict[str, int]
variable_roles yes yes dict[str, str]
variable_missing_values yes yes dict[str, dict]
variable_attributes yes yes dict[str, dict[str, list[str]]]
mr_sets yes yes dict[str, dict]
arrow_data_types yes dict[str, str]

Creating metadata from scratch:

meta = am.SpssMetadata(
    file_label="Customer Survey 2026",
    variable_labels={"Q1": "Satisfaction", "Q2": "Loyalty"},
    variable_value_labels={"Q1": {1: "Low", 5: "High"}},
    variable_measures={"Q1": "ordinal", "Q2": "nominal"},
)
am.write_sav(df, "output.sav", meta=meta)

Modifying existing metadata (from read_sav() or a previously created SpssMetadata):

# .update() — bulk update multiple fields at once, merges dicts, replaces scalars
meta2 = meta.update(
    file_label="Updated Survey",
    variable_labels={"Q3": "NPS"},        # Q1/Q2 labels preserved, Q3 added
    variable_measures={"Q3": "scale"},
)

# .with_*() — chainable single-field setters, with full IDE autocomplete and type hints
meta3 = (meta
    .with_file_label("Updated Survey")
    .with_variable_labels({"Q3": "NPS"})
    .with_variable_measures({"Q3": "scale"})
)

Immutability: SpssMetadata is immutable. .update() and .with_*() always return a new instance — the original is never modified. Assign to a new variable if you need to keep both copies.

Update logic:

  • Dict fields (labels, formats, measures, etc.) merge as an overlay — new keys are added, existing keys are overwritten, all other keys are preserved. Pass {key: None} to remove a key.
  • Scalar fields (file_label, weight_variable) and notes are replaced entirely.
  • Column renames are not tracked. If you rename "Q1" to "Q1a" in your DataFrame, metadata for "Q1" does not carry over — you must explicitly provide metadata for "Q1a".

See metadata.md for the full API reference including update logic details, missing values, MR sets, and validation rules.

SPSS tip: Custom variable attributes are not shown in SPSS's Variable View by default. Go to View > Customize Variable View and click OK, or run DISPLAY ATTRIBUTES in SPSS syntax.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Best of 3–5 runs (with warmup) on Windows 11, Python 3.13, Intel Core Ultra 9 275HX (24C), 64 GB RAM (6400 MT/s).

File Size Rows Cols ambers polars_readstat pyreadstat vs prs vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 < 0.01s < 0.01s 0.011s
test_2 (bytecode) 147 MB 22,070 677 0.286s 0.897s 3.524s 3.1x 12x
test_3 (uncompressed) 1.1 GB 79,066 915 0.322s 1.150s 4.918s 3.6x 15x
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.003s 0.012s 1.5x 6x
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.003s 0.016s 1.5x 8x
test_6 (uncompressed) 5.4 GB 395,330 916 1.600s 1.752s 25.214s 1.1x 16x
  • Faster than polars_readstat on all tested files — 1.1–3.6x faster
  • 6–16x faster than pyreadstat across all file sizes
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

Write

write_sav() writes a Polars DataFrame + metadata back to .sav (bytecode) or .zsav (zlib). Best of 5 runs on the same machine.

File Size Rows Cols Mode ambers pyreadstat Speedup
test_1 (bytecode) 0.2 MB 1,500 75 .sav 0.001s 0.019s 13x
.zsav 0.004s 0.025s 6x
test_2 (bytecode) 147 MB 22,070 677 .sav 0.539s 3.622s 7x
.zsav 0.386s 4.174s 11x
test_3 (uncompressed) 1.1 GB 79,066 915 .sav 0.439s 13.963s 32x
.zsav 0.436s 17.991s 41x
test_4 (uncompressed) 0.6 MB 201 158 .sav 0.002s 0.027s 16x
.zsav 0.004s 0.035s 9x
test_5 (uncompressed) 0.6 MB 203 136 .sav 0.001s 0.023s 17x
.zsav 0.003s 0.027s 9x
test_6 (uncompressed) 5.4 GB 395,330 916 .sav 2.511s 84.836s 34x
.zsav 2.255s 90.499s 40x
  • 6–41x faster than pyreadstat on writes across all files and compression modes
  • Full metadata roundtrip: variable labels, value labels, missing values, MR sets, display properties
  • Bytecode (.sav) and zlib (.zsav) compression

Roadmap

  • Continued I/O performance optimization
  • Expanded SPSS metadata field coverage
  • Rich metadata manipulation — add, update, merge, and remove metadata programmatically
  • Individual metadata field overrides in write_sav() — pass variable_labels=, variable_value_labels=, etc. alongside meta= to selectively override fields
  • Currently supports read and write with Polars DataFrames (eager and lazy) — extending to pandas, Narwhals, DuckDB, and others

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.3.6.tar.gz (143.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.3.6-cp314-cp314-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.3.6-cp314-cp314-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.3.6-cp314-cp314-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.3.6-cp313-cp313-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.3.6-cp313-cp313-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.3.6-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.3.6-cp312-cp312-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.3.6-cp312-cp312-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.3.6-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.3.6.tar.gz.

File metadata

  • Download URL: ambers-0.3.6.tar.gz
  • Upload date:
  • Size: 143.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.6.tar.gz
Algorithm Hash digest
SHA256 60dd428396f30b67861a6cd3e2d82501f0570791064a5552e66a99a21c98b51b
MD5 0eae5d2a2a2e2ffb2efb498a2e287638
BLAKE2b-256 7862edbdbdd4190cf52162fdc9809eed6a54e61bfeb336bac0c6f74476761d0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.6.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.6-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.6-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.6-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 754ddf0ddc9d996d6cf3cc8193e92bfe645ab0853b13781464aaf67110c6374b
MD5 698f84f6ff576a040607c3c36383d698
BLAKE2b-256 755a84c318a39e47969cee0239b267318bb63de45508c47996f5bbd47e2c1736

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.6-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.6-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.6-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e4504698610bc0c258e6b3ad64ad6ac3a08d91f7e6bf8989d86aadde4ba758d0
MD5 ca7d9858733e6373470c0a1064fcd26c
BLAKE2b-256 4faa13c241280530324ad786711811d4f929084646cd502a4f00d5d021c39a13

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.6-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.6-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.6-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4df58a26e23ec15a3738a50b74f57e3068a45613df30d5d3ec91cfad9b5e1729
MD5 c78819ec3319365c11361a1b6e90a151
BLAKE2b-256 2b0591fe6b7bf2b0e9a5c5b8fee7f1fe306bc0a5e2840b321461fe6aa93b21da

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.6-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.6-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.6-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.6-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 5fce9e4f40b8a21b3b7a56b13092fc05b9dd50102d39d8d270e54915b0de362c
MD5 1448c7abec5d9daf9baa903e0013942c
BLAKE2b-256 ae7a6b72dfc12e392d5cce56b752354759cf7d787f62564c42188c007d36667b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.6-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.6-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.6-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 c59bb748ec1ac7d9d4778b7e4a3542f1501f1bf90e55fa4139503fdd2c762590
MD5 deb1fcd7a32abadd0cd3687ee139a8c6
BLAKE2b-256 40dad0c1534039c3500ed88ffd8c44ab62a2b2a72ae99ad38461cb433ac75c1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.6-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.6-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.6-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a04ca484a8491de36370a8e78479f9b66ad5da848e38183ecf63961a2c851db1
MD5 09f31a28a0b123fc8d2f67460a761b98
BLAKE2b-256 f0917a8fa779fb4dfac8f3691b9b9e321358c4ddd50c816117e6aa7901adfd69

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.6-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.6-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.6-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.6-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 8568455f72f3dc5b7a9034dc56af70dfd50203a912a3bdad1eb2fb5bd1e27942
MD5 447349d3eea1616932ada6c4e30c59b4
BLAKE2b-256 38942b3fab2f442f7c7a825496218ef524f59ae2c459f2d74c466f78c21e3b7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.6-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.6-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.6-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 18c413f862a51c3d2bdb3090665790b5485de4905427b442e2dda3c377f7078e
MD5 cb2643973673373cd8eb46683d3d9270
BLAKE2b-256 c3b014402ea982b0a594fc30d4c90385217b6e2e3e248dcebf7d2864752a6df2

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.6-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.6-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.6-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 34dc26f2f2be20b3498fb63ecd7fafe3d282e304c39263fc430f059bd3fe52fe
MD5 f65e3d32ff3ff296d6b2b7e2cf2370c0
BLAKE2b-256 0a40bdfb1957021cd98ab5852e505d2742d32c208bc236d0c40b14cf960f7c94

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.6-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page