Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader and writer — Arrow-native, zero C dependencies.

Features

  • Blazing fast read and write for SPSS .sav (bytecode) and .zsav (zlib) files
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels, and more
  • Lazy reader via scan_sav() — Polars LazyFrame with projection and row limit pushdown
  • Pure Rust with a native Python API — native Arrow integration, no C dependencies
  • Benchmarked up to 3–10x faster reads and 4–20x faster writes compared to current popular SPSS I/O libraries

Installation

Python:

uv add ambers

Rust:

cargo add ambers

Python

import ambers as am
import polars as pl

# Eager read — data + metadata
df, meta = am.read_sav("survey.sav")

# Lazy read — returns Polars LazyFrame
lf, meta = am.scan_sav("survey.sav")
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_metadata("survey.sav")

# Write back — roundtrip with full metadata
df = df.filter(pl.col("age") > 18)
am.write_sav(df, "filtered.sav", meta=meta)                        # bytecode (default for .sav)
am.write_sav(df, "compressed.zsav", meta=meta)                     # zlib (default for .zsav)
am.write_sav(df, "raw.sav", meta=meta, compression="uncompressed") # no compression
am.write_sav(df, "fast.zsav", meta=meta, compression_level=1)      # fast zlib

# From scratch — metadata is optional, inferred from DataFrame schema
am.write_sav(df, "new.sav")

.sav uses bytecode compression by default, .zsav uses zlib. Pass compression= to override ("uncompressed", "bytecode", "zlib"). Pass meta= to preserve all metadata from a prior read_sav(), or omit it to infer formats from the DataFrame.

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.role("Q1") Variable role ("input", "target", "both", "none", "partition", "split")
meta.attribute("Q1", "CustomNote") Custom attribute values (list[str] or None)
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Metadata Fields

All fields returned by the reader. Fields marked Write are preserved when passed via meta= to write_sav(). Read-only fields are set automatically (encoding, timestamps, row/column counts, etc.).

Note: This is a first pass — field names and behavior may change without warning in future releases.

Field Read Write Type
file_label yes yes str
file_format yes str
file_encoding yes str
creation_time yes str
compression yes str
number_columns yes int
number_rows yes int | None
weight_variable yes yes str | None
notes yes yes list[str]
variable_names yes list[str]
variable_labels yes yes dict[str, str]
variable_value_labels yes yes dict[str, dict[float|str, str]]
variable_formats yes yes dict[str, str]
variable_measures yes yes dict[str, str]
variable_alignments yes yes dict[str, str]
variable_storage_widths yes dict[str, int]
variable_display_widths yes yes dict[str, int]
variable_roles yes yes dict[str, str]
variable_missing_values yes yes dict[str, dict]
variable_attributes yes yes dict[str, dict[str, list[str]]]
mr_sets yes yes dict[str, dict]
arrow_data_types yes dict[str, str]

Creating metadata from scratch:

meta = am.SpssMetadata(
    file_label="Customer Survey 2026",
    variable_labels={"Q1": "Satisfaction", "Q2": "Loyalty"},
    variable_value_labels={"Q1": {1: "Low", 5: "High"}},
    variable_measures={"Q1": "ordinal", "Q2": "nominal"},
)
am.write_sav(df, "output.sav", meta=meta)

Modifying existing metadata (from read_sav() or a previously created SpssMetadata):

# .update() — bulk update multiple fields at once, merges dicts, replaces scalars
meta2 = meta.update(
    file_label="Updated Survey",
    variable_labels={"Q3": "NPS"},        # Q1/Q2 labels preserved, Q3 added
    variable_measures={"Q3": "scale"},
)

# .with_*() — chainable single-field setters, with full IDE autocomplete and type hints
meta3 = (meta
    .with_file_label("Updated Survey")
    .with_variable_labels({"Q3": "NPS"})
    .with_variable_measures({"Q3": "scale"})
)

Immutability: SpssMetadata is immutable. .update() and .with_*() always return a new instance — the original is never modified. Assign to a new variable if you need to keep both copies.

Update logic:

  • Dict fields (labels, formats, measures, etc.) merge as an overlay — new keys are added, existing keys are overwritten, all other keys are preserved. Pass {key: None} to remove a key.
  • Scalar fields (file_label, weight_variable) and notes are replaced entirely.
  • Column renames are not tracked. If you rename "Q1" to "Q1a" in your DataFrame, metadata for "Q1" does not carry over — you must explicitly provide metadata for "Q1a".

See metadata.md for the full API reference including update logic details, missing values, MR sets, and validation rules.

SPSS tip: Custom variable attributes are not shown in SPSS's Variable View by default. Go to View > Customize Variable View and click OK, or run DISPLAY ATTRIBUTES in SPSS syntax.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Best of 3–5 runs (with warmup) on Windows 11, Python 3.13, 24-core machine.

File Size Rows Cols ambers polars_readstat pyreadstat vs prs vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 < 0.01s < 0.01s 0.011s
test_2 (bytecode) 147 MB 22,070 677 0.286s 0.897s 3.524s 3.1x 12x
test_3 (uncompressed) 1.1 GB 79,066 915 0.322s 1.150s 4.918s 3.6x 15x
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.003s 0.012s 1.5x 6x
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.003s 0.016s 1.5x 8x
test_6 (uncompressed) 5.4 GB 395,330 916 1.600s 1.752s 25.214s 1.1x 16x
  • Faster than polars_readstat on all tested files — 1.1–3.6x faster
  • 6–16x faster than pyreadstat across all file sizes
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

Write

write_sav() writes a Polars DataFrame + metadata back to .sav (bytecode) or .zsav (zlib). Best of 5 runs on the same machine.

File Size Rows Cols Mode ambers pyreadstat Speedup
test_1 (bytecode) 0.2 MB 1,500 75 .sav 0.001s 0.019s 13x
.zsav 0.004s 0.026s 7x
test_2 (bytecode) 147 MB 22,070 677 .sav 0.567s 3.849s 7x
.zsav 1.088s 4.415s 4x
test_3 (uncompressed) 1.1 GB 79,066 915 .sav 0.950s 16.152s 17x
.zsav 1.774s 17.362s 10x
test_6 (uncompressed) 5.4 GB 395,330 916 .sav 5.700s 79.999s 14x
.zsav 8.193s 85.491s 10x
  • 4–20x faster than pyreadstat on writes across all files and compression modes
  • Full metadata roundtrip: variable labels, value labels, missing values, MR sets, display properties
  • Bytecode (.sav) and zlib (.zsav) compression

Roadmap

  • Continued I/O performance optimization
  • Expanded SPSS metadata field coverage
  • Rich metadata manipulation — add, update, merge, and remove metadata programmatically
  • Individual metadata field overrides in write_sav() — pass variable_labels=, variable_value_labels=, etc. alongside meta= to selectively override fields
  • Currently supports read and write with Polars DataFrames (eager and lazy) — extending to pandas, Narwhals, DuckDB, and others

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.3.3.tar.gz (129.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.3.3-cp314-cp314-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.3.3-cp314-cp314-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.3.3-cp314-cp314-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.3.3-cp313-cp313-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.3.3-cp313-cp313-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.3.3-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.3.3-cp312-cp312-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.3.3-cp312-cp312-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.3.3-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.3.3.tar.gz.

File metadata

  • Download URL: ambers-0.3.3.tar.gz
  • Upload date:
  • Size: 129.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.3.tar.gz
Algorithm Hash digest
SHA256 f5964701cff70f7c439b352fbeddd489b36283bb91d2a08ad6e780301ba3bf75
MD5 13e5e776ebab865cc8ffafba617f764c
BLAKE2b-256 b328cd7a343519d2b671107fb64363c03c3af0c5cf0fd4558b357910ab5b381a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.3.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.3-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.3-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.3-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 21d5d6cf2e2e43327179aaa61c79ed7c0e13bc942d6291a191cd51ee99620687
MD5 85e3f2dad1cfbf20ced8b06d46fd9418
BLAKE2b-256 0318ebbde26e10b08f39602cf722569b63bdc1770ede741c8529119f497ebcb8

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.3-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.3-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.3-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ccd17641e143639b5ebd88cb32e9ec4ef7374af5ea29d0044dc1901a4ff44fa2
MD5 895bd002b80d8af0e5198ad0fe514a41
BLAKE2b-256 3e3bcc31830daef57d3072050a4153f4e81638673beab622a1e03fcb289b51c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.3-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.3-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.3-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a770900f4371ad8a680b9602edee23ef60f325e9b639e5d074a16c1225af91f1
MD5 436bddcb4ea53877d9db71c5ab36270d
BLAKE2b-256 4b6d44c60191c9168fdfba92ab0ed917960391be8fd30551b2672af1659aa663

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.3-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.3-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.3-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.3-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 48647b1dbe1fa9f31b62697e95aa94dd37d62091cbc94731a67adfd7a365500d
MD5 0540d26762dfe2068518308f358a9eda
BLAKE2b-256 da5eae006527737b288d80c0afbcda4ada8f4be38bcf206fe0d8d62b5330180d

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.3-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.3-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.3-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 2dae9b8cd3bd44c80d302d529899096ddfc244521bf08d12078210dc21b29b87
MD5 24285b121b5737629a149bc381c41147
BLAKE2b-256 5c37497b2acafd6eeb66cbca3f4e78f427829b900b8d394143eb0f2bbcea0a7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.3-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e718ef3449fb92d4b1e99956a518a15e7a4609098ce98182961ce1f3e51e7746
MD5 80da64fe0a32ba1356dc687f3d6c6e3c
BLAKE2b-256 bf609a23e70876e73e03c1b8958ce6cc812d8b9f7fe734824c7d584d98fc18f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.3-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 94580e7d3545f4571ea0b4b8e6d50eb569825c7cdda22c89d55d8b4b0abad344
MD5 d1663695e4257c671cca5309dd0315ac
BLAKE2b-256 7c9b160abdc810cb9593f97bb72952979f5f7a08d561cc973bc4e18d55f0ef40

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.3-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.3-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.3-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 f633ca05904fe7aa3c0a2748ffb2bf4495e727634667a534fa858f514167e645
MD5 a636cfc5d901471925fe00a223638076
BLAKE2b-256 40683f7cf9d26bc166c1e714c4e449fc04104c0cc01d6b19a809e2f258850cba

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.3-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ba8b9e937afafa97a251f765ade11b1e66cf93beac7105830da0161994f094af
MD5 d3c8c2125650fd44289e8ef73cf1d02c
BLAKE2b-256 fc391844af6f545f77d146aafc4ebf1b6ab93c61e39836566a6621721142f8b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.3-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page