Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader and writer — Arrow-native, zero C dependencies.

Features

  • Blazing fast read and write for SPSS .sav (bytecode) and .zsav (zlib) files
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels, and more
  • Lazy reader via scan_sav() — Polars LazyFrame with projection and row limit pushdown
  • Pure Rust with a native Python API — native Arrow integration, no C dependencies
  • Benchmarked up to 3–10x faster reads and 4–20x faster writes compared to current popular SPSS I/O libraries

Installation

Python:

uv add ambers

Rust:

cargo add ambers

Python

import ambers as am
import polars as pl

# Eager read — data + metadata
df, meta = am.read_sav("survey.sav")

# Lazy read — returns Polars LazyFrame
lf, meta = am.scan_sav("survey.sav")
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_metadata("survey.sav")

# Write back — roundtrip with full metadata
df = df.filter(pl.col("age") > 18)
am.write_sav(df, "filtered.sav", meta=meta)                        # bytecode (default for .sav)
am.write_sav(df, "compressed.zsav", meta=meta)                     # zlib (default for .zsav)
am.write_sav(df, "raw.sav", meta=meta, compression="uncompressed") # no compression
am.write_sav(df, "fast.zsav", meta=meta, compression_level=1)      # fast zlib

# From scratch — metadata is optional, inferred from DataFrame schema
am.write_sav(df, "new.sav")

.sav uses bytecode compression by default, .zsav uses zlib. Pass compression= to override ("uncompressed", "bytecode", "zlib"). Pass meta= to preserve all metadata from a prior read_sav(), or omit it to infer formats from the DataFrame.

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.role("Q1") Variable role ("input", "target", "both", "none", "partition", "split")
meta.attribute("Q1", "CustomNote") Custom attribute values (list[str] or None)
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Metadata Fields

All fields returned by the reader. Fields marked Write are preserved when passed via meta= to write_sav(). Read-only fields are set automatically (encoding, timestamps, row/column counts, etc.).

Note: This is a first pass — field names and behavior may change without warning in future releases.

Field Read Write Type
file_label yes yes str
file_format yes str
file_encoding yes str
creation_time yes str
compression yes str
number_columns yes int
number_rows yes int | None
weight_variable yes yes str | None
notes yes yes list[str]
variable_names yes list[str]
variable_labels yes yes dict[str, str]
variable_value_labels yes yes dict[str, dict[float|str, str]]
variable_formats yes yes dict[str, str]
variable_measures yes yes dict[str, str]
variable_alignments yes yes dict[str, str]
variable_storage_widths yes dict[str, int]
variable_display_widths yes yes dict[str, int]
variable_roles yes yes dict[str, str]
variable_missing_values yes yes dict[str, dict]
variable_attributes yes yes dict[str, dict[str, list[str]]]
mr_sets yes yes dict[str, dict]
arrow_data_types yes dict[str, str]

Creating metadata from scratch:

meta = am.SpssMetadata(
    file_label="Customer Survey 2026",
    variable_labels={"Q1": "Satisfaction", "Q2": "Loyalty"},
    variable_value_labels={"Q1": {1: "Low", 5: "High"}},
    variable_measures={"Q1": "ordinal", "Q2": "nominal"},
)
am.write_sav(df, "output.sav", meta=meta)

Modifying existing metadata (from read_sav() or a previously created SpssMetadata):

# .update() — bulk update multiple fields at once, merges dicts, replaces scalars
meta2 = meta.update(
    file_label="Updated Survey",
    variable_labels={"Q3": "NPS"},        # Q1/Q2 labels preserved, Q3 added
    variable_measures={"Q3": "scale"},
)

# .with_*() — chainable single-field setters, with full IDE autocomplete and type hints
meta3 = (meta
    .with_file_label("Updated Survey")
    .with_variable_labels({"Q3": "NPS"})
    .with_variable_measures({"Q3": "scale"})
)

Immutability: SpssMetadata is immutable. .update() and .with_*() always return a new instance — the original is never modified. Assign to a new variable if you need to keep both copies.

Update logic:

  • Dict fields (labels, formats, measures, etc.) merge as an overlay — new keys are added, existing keys are overwritten, all other keys are preserved. Pass {key: None} to remove a key.
  • Scalar fields (file_label, weight_variable) and notes are replaced entirely.
  • Column renames are not tracked. If you rename "Q1" to "Q1a" in your DataFrame, metadata for "Q1" does not carry over — you must explicitly provide metadata for "Q1a".

See metadata.md for the full API reference including update logic details, missing values, MR sets, and validation rules.

SPSS tip: Custom variable attributes are not shown in SPSS's Variable View by default. Go to View > Customize Variable View and click OK, or run DISPLAY ATTRIBUTES in SPSS syntax.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Best of 3–5 runs (with warmup) on Windows 11, Python 3.13, Intel Core Ultra 9 275HX (24C), 64 GB RAM (6400 MT/s).

File Size Rows Cols ambers polars_readstat pyreadstat vs prs vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 < 0.01s < 0.01s 0.011s
test_2 (bytecode) 147 MB 22,070 677 0.286s 0.897s 3.524s 3.1x 12x
test_3 (uncompressed) 1.1 GB 79,066 915 0.322s 1.150s 4.918s 3.6x 15x
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.003s 0.012s 1.5x 6x
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.003s 0.016s 1.5x 8x
test_6 (uncompressed) 5.4 GB 395,330 916 1.600s 1.752s 25.214s 1.1x 16x
  • Faster than polars_readstat on all tested files — 1.1–3.6x faster
  • 6–16x faster than pyreadstat across all file sizes
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

Write

write_sav() writes a Polars DataFrame + metadata back to .sav (bytecode) or .zsav (zlib). Best of 5 runs on the same machine.

File Size Rows Cols Mode ambers pyreadstat Speedup
test_1 (bytecode) 0.2 MB 1,500 75 .sav 0.001s 0.019s 13x
.zsav 0.004s 0.025s 6x
test_2 (bytecode) 147 MB 22,070 677 .sav 0.539s 3.622s 7x
.zsav 0.386s 4.174s 11x
test_3 (uncompressed) 1.1 GB 79,066 915 .sav 0.439s 13.963s 32x
.zsav 0.436s 17.991s 41x
test_4 (uncompressed) 0.6 MB 201 158 .sav 0.002s 0.027s 16x
.zsav 0.004s 0.035s 9x
test_5 (uncompressed) 0.6 MB 203 136 .sav 0.001s 0.023s 17x
.zsav 0.003s 0.027s 9x
test_6 (uncompressed) 5.4 GB 395,330 916 .sav 2.511s 84.836s 34x
.zsav 2.255s 90.499s 40x
  • 6–41x faster than pyreadstat on writes across all files and compression modes
  • Full metadata roundtrip: variable labels, value labels, missing values, MR sets, display properties
  • Bytecode (.sav) and zlib (.zsav) compression

Roadmap

  • Continued I/O performance optimization
  • Expanded SPSS metadata field coverage
  • Rich metadata manipulation — add, update, merge, and remove metadata programmatically
  • Individual metadata field overrides in write_sav() — pass variable_labels=, variable_value_labels=, etc. alongside meta= to selectively override fields
  • Currently supports read and write with Polars DataFrames (eager and lazy) — extending to pandas, Narwhals, DuckDB, and others

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.3.5.tar.gz (143.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.3.5-cp314-cp314-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.3.5-cp314-cp314-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.3.5-cp314-cp314-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.3.5-cp313-cp313-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.3.5-cp313-cp313-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.3.5-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.3.5-cp312-cp312-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.3.5-cp312-cp312-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.3.5-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.3.5.tar.gz.

File metadata

  • Download URL: ambers-0.3.5.tar.gz
  • Upload date:
  • Size: 143.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.5.tar.gz
Algorithm Hash digest
SHA256 8daa06a57adb1a0e91b250c9423588a4be061ff200261c4c555e7c306401cc4c
MD5 b7170f159cb5d08a0657117059565e21
BLAKE2b-256 227b1b08d1bca17f256fadc32c1899cee73aa67142790f3887c4bf087f9a3617

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.5.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.5-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.5-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.5-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 48499d37d622abcc582b0504800c0b6596d8b9b799b3bf3cbb302a705618938d
MD5 6d0593ab3784136b8bb86d8b2eee3ce7
BLAKE2b-256 bb7545a56f72eefbaa20d1cb9c425a2a06e33c19a19927f504a6e9fe2f3d98bd

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.5-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.5-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.5-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6832e1e8cece6dbb20d25d1b2fc569471d3fc75dcd5e8eabe7a545104dfe5abd
MD5 3a5beb1a29eeaf4d639877b0202f0abd
BLAKE2b-256 982effc8f9e73bf0864e6a028bf0e119a92f99c18f70fab7fa83a4a33a62a465

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.5-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.5-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.5-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 14784d78beddfbcef1ae4c013bf26f0967a50989d3e38e5124e9927b28b3fdae
MD5 d0500be9ef821c739b75ccb11967425a
BLAKE2b-256 56d7a98bd5fc8b56b3d733934148a93f1342fbd83895678ee81f924c2b14fb2e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.5-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.5-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.5-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.5-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 298b2c9ea3e1b72dd487a2ae0832d0b3fd31dd572785d8ef7d0dd5e099aea35d
MD5 8cab1f0f2e6b8635c729f6fdcc38d318
BLAKE2b-256 46787501e2e4474d68c965428c7bd6f8b2bf8239009aafd3ce5dfc3b36deb277

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.5-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.5-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.5-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 88816e026df0b6a1fe3bfe6e84bc864d9c563a5dae7295d49875c79209e5bf65
MD5 87102e1e7c121b72f3aad20d72d2d322
BLAKE2b-256 a2df51f0e5e03f5dd9a1372f32ac790a63cada95edd594620e5d8ed36f55b5d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.5-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.5-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.5-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f357d699188f1b1f656ee5cf4ec71afed2d7d1a42bb422424202df1f27b12fde
MD5 667fc9ab674969aa7fdc0544d28d86a2
BLAKE2b-256 09daf9da0661c58d0f3f24a675aaf7c72cf3eb7e18c9c04e60f950c3ad7b3354

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.5-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.5-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.5-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.5-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 db211fd004516b0070588269e2468b370b96f4ad3a6b4485f3f4244d9027113f
MD5 d47f960760b79ac4f95a39533418982e
BLAKE2b-256 3542ce3a309d2e08e68ad93e08f5d400bcb300f41974fdb8ce0a5feb927d557d

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.5-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.5-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.5-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e9ab1c39e5b5ac40b70e1284d71c28fe9c610d9b081e0f992e516dea93883a13
MD5 50c9379636dfbb8a3f96a188cdae1646
BLAKE2b-256 f1f1f68e1a473c859334e8456a5a2f8b27e21409f751583fe35d471adc33ca80

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.5-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 22ae61c9ba9e107e50f1c1dce8b84ea96dbeafd0347d5753f6c6aab5933d3291
MD5 fd7755273c729624545ef39c02971c51
BLAKE2b-256 cd214973458999e524fe864b2e47da37dbde7f6f8b6c60c9ccd2da6566fbc264

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.5-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page