Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader and writer — Arrow-native, zero C dependencies.

Features

  • Blazing fast read and write for SPSS .sav (bytecode) and .zsav (zlib) files
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels, and more
  • Lazy reader via scan_sav() — Polars LazyFrame with projection and row limit pushdown
  • Pure Rust with a native Python API — native Arrow integration, no C dependencies
  • Benchmarked up to 3–10x faster reads and 4–20x faster writes compared to current popular SPSS I/O libraries

Installation

Python:

uv add ambers

Rust:

cargo add ambers

Python

import ambers as am
import polars as pl

# Eager read — data + metadata
df, meta = am.read_sav("survey.sav")

# Lazy read — returns Polars LazyFrame
lf, meta = am.scan_sav("survey.sav")
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_metadata("survey.sav")

# Write back — roundtrip with full metadata
df = df.filter(pl.col("age") > 18)
am.write_sav(df, "filtered.sav", meta=meta)

# Write as .zsav (zlib compressed)
am.write_sav(df, "compressed.zsav", meta=meta)

# From scratch — metadata is optional, inferred from DataFrame schema
am.write_sav(df, "new.sav")

Use .sav for bytecode compression (default), .zsav for zlib compression. Pass meta= to preserve all metadata from a prior read_sav(), or omit it to infer formats from the DataFrame. Individual writable fields (e.g., variable_labels, variable_value_labels) can also be passed directly as keyword arguments for fine-grained control.

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.role("Q1") Variable role ("input", "target", "both", "none", "partition", "split")
meta.attribute("Q1", "CustomNote") Custom attribute values (list[str] or None)
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Metadata Fields

All fields returned by the reader. Fields marked Write are preserved when passed via meta= to write_sav(). Read-only fields are set automatically (encoding, timestamps, row/column counts, etc.).

Note: This is a first pass — field names and behavior may change without warning in future releases.

Field Read Write Type
variable_names yes list[str]
variable_labels yes yes dict[str, str]
variable_value_labels yes yes dict[str, dict[float|str, str]]
variable_formats yes yes dict[str, str]
variable_measures yes yes dict[str, str]
variable_alignments yes yes dict[str, str]
variable_roles yes yes dict[str, str]
variable_display_widths yes yes dict[str, int]
variable_storage_widths yes dict[str, int]
variable_missing_values yes yes dict[str, dict]
variable_attributes yes yes dict[str, dict[str, list[str]]]
weight_variable yes yes str | None
mr_sets yes yes dict[str, dict]
arrow_data_types yes dict[str, str]
file_label yes yes str
file_format yes str
file_encoding yes str
creation_time yes str
number_rows yes int | None
number_columns yes int
compression yes str
notes yes yes list[str]

Creating metadata from scratch:

meta = am.SpssMetadata(
    file_label="Customer Survey 2026",
    variable_labels={"Q1": "Satisfaction", "Q2": "Loyalty"},
    variable_value_labels={"Q1": {1: "Low", 5: "High"}},
    variable_measures={"Q1": "ordinal", "Q2": "nominal"},
)
am.write_sav(df, "output.sav", meta=meta)

Modifying existing metadata (from read_sav() or a previously created SpssMetadata):

# .update() — bulk update multiple fields at once, merges dicts, replaces scalars
meta2 = meta.update(
    file_label="Updated Survey",
    variable_labels={"Q3": "NPS"},        # Q1/Q2 labels preserved, Q3 added
    variable_measures={"Q3": "scale"},
)

# .with_*() — chainable single-field setters, with full IDE autocomplete and type hints
meta3 = (meta
    .with_file_label("Updated Survey")
    .with_variable_labels({"Q3": "NPS"})
    .with_variable_measures({"Q3": "scale"})
)

Immutability: SpssMetadata is immutable. .update() and .with_*() always return a new instance — the original is never modified. Assign to a new variable if you need to keep both copies.

Update logic:

  • Dict fields (labels, formats, measures, etc.) merge as an overlay — new keys are added, existing keys are overwritten, all other keys are preserved. Pass {key: None} to remove a key.
  • Scalar fields (file_label, weight_variable) and notes are replaced entirely.
  • Column renames are not tracked. If you rename "Q1" to "Q1a" in your DataFrame, metadata for "Q1" does not carry over — you must explicitly provide metadata for "Q1a".

See metadata.md for the full API reference including update logic details, missing values, MR sets, and validation rules.

SPSS tip: Custom variable attributes are not shown in SPSS's Variable View by default. Go to View > Customize Variable View and click OK, or run DISPLAY ATTRIBUTES in SPSS syntax.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Best of 3–5 runs (with warmup) on Windows 11, Python 3.13, 24-core machine.

File Size Rows Cols ambers polars_readstat pyreadstat vs prs vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 < 0.01s < 0.01s 0.011s
test_2 (bytecode) 147 MB 22,070 677 0.286s 0.897s 3.524s 3.1x 12x
test_3 (uncompressed) 1.1 GB 79,066 915 0.322s 1.150s 4.918s 3.6x 15x
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.003s 0.012s 1.5x 6x
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.003s 0.016s 1.5x 8x
test_6 (uncompressed) 5.4 GB 395,330 916 1.600s 1.752s 25.214s 1.1x 16x
  • Faster than polars_readstat on all tested files — 1.1–3.6x faster
  • 6–16x faster than pyreadstat across all file sizes
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

Write

write_sav() writes a Polars DataFrame + metadata back to .sav (bytecode) or .zsav (zlib). Best of 5 runs on the same machine.

File Size Rows Cols Mode ambers pyreadstat Speedup
test_1 (bytecode) 0.2 MB 1,500 75 .sav 0.001s 0.019s 13x
.zsav 0.004s 0.026s 7x
test_2 (bytecode) 147 MB 22,070 677 .sav 0.567s 3.849s 7x
.zsav 1.088s 4.415s 4x
test_3 (uncompressed) 1.1 GB 79,066 915 .sav 0.950s 16.152s 17x
.zsav 1.774s 17.362s 10x
test_6 (uncompressed) 5.4 GB 395,330 916 .sav 5.700s 79.999s 14x
.zsav 8.193s 85.491s 10x
  • 4–20x faster than pyreadstat on writes across all files and compression modes
  • Full metadata roundtrip: variable labels, value labels, missing values, MR sets, display properties
  • Bytecode (.sav) and zlib (.zsav) compression

Roadmap

  • Continued I/O performance optimization
  • Expanded SPSS metadata field coverage
  • Rich metadata manipulation — add, update, merge, and remove metadata programmatically
  • Individual metadata field overrides in write_sav() — pass variable_labels=, variable_value_labels=, etc. alongside meta= to selectively override fields
  • Currently supports read and write with Polars DataFrames (eager and lazy) — extending to pandas, Narwhals, DuckDB, and others

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.3.2.tar.gz (125.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.3.2-cp314-cp314-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.3.2-cp314-cp314-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.3.2-cp314-cp314-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.3.2-cp313-cp313-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.3.2-cp313-cp313-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.3.2-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.3.2-cp312-cp312-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.3.2-cp312-cp312-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.3.2-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.3.2.tar.gz.

File metadata

  • Download URL: ambers-0.3.2.tar.gz
  • Upload date:
  • Size: 125.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.2.tar.gz
Algorithm Hash digest
SHA256 df18e8d01e893cd1c1d6d2bb827e69bed095ad9c8940b52b38e92d5b9c59ddb4
MD5 c367fa842e93f93e8e038d6b9b1fab71
BLAKE2b-256 a016e0fa19368286b53ea822e9a97ba6ceeab5fa901c82b5c9270404499e34de

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.2.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.2-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.2-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.2-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 b74d6bb0ef5940e47519c7a6c1c7bf415462738e032a052fe8c1770216497b56
MD5 1e1e232e1ddd0bf4c43fd1cd1319425b
BLAKE2b-256 fe93eb8032e6c83ff0204da59b7276fb364d5daa9f44ad0156b3abc88337d813

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.2-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.2-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.2-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 90d0a03e2960e4ca567c7a322e5f9158a9a423354565d5b10bb7ba8022261acd
MD5 a8c4735fcd45d82af114a15fc27d3e2f
BLAKE2b-256 e2c6f68bc9a11bc5670d7ed7b34410bbe61898f15e984b2ba409f8a36bdce300

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.2-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.2-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.2-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 51ad346c4efb6f68599b9b21bb11239e86bb135bce8fef7c38cbcac31e0392ff
MD5 f468e68815770365ad3a91399ead5ae3
BLAKE2b-256 5d41e50ae66d27c512e7cae0af426f37c8bc075512dfb80e059261e3a245f975

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.2-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.2-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.2-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 4aec0ece08d54c49ef2242365cc4a767ce0be5aaba85024cdecdd6e37eb9b26f
MD5 e9665446977044d8c6f26417567940fb
BLAKE2b-256 11d22931f0c787624a866642d2353d4cc0375a944b3b6c55d0daf86ad1a77b48

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.2-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.2-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.2-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b4a04a0413f87905161f5ba33ef5df5ee744be51a825f05e018aad450d2e5ef7
MD5 5249c414f458f805fdd5a89035b82276
BLAKE2b-256 1af983a3b70ece34e67dd7d3da0d00e6e767efcb7829cc4648197ae45a756f41

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.2-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9c691ebbfa0b5380e79c075653f4b65a41b896f661306b113e357be8a9536ca0
MD5 127ed62494f31cfff0a88670cfebc9b5
BLAKE2b-256 3771792db84f20cf7b6e08f4795d86be6a2356373a1678aebc498cc2a41d91cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 bb51de685c6f33a16105a413111e012800e10eaef0dd03565937e042b8ce38e7
MD5 328137358dc0dbcc890b1b21bbf44651
BLAKE2b-256 1487ed20cca45d759601d75d2d96f334b85a251e59409350dac75d5409a44571

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.2-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.2-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.2-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b3676684b8ddf43ec1b56e526b7d6a327c5d6015085abdebc80661a6472c7291
MD5 349c4442d8ef29ac21cf1069563a9983
BLAKE2b-256 443e4c8d1d11d2162df01c117cff6467a5b874769751c42e3e401e6897e4dc67

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.2-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0b742f18652474499500b07d400e9021eec1bdcac8a0afa3c02fa45b7a04f576
MD5 be93d7045472af787a05305a12defe7b
BLAKE2b-256 a8fe310af27f9e10c3b5daeb210d04a7d89ded82de09a47b62c880c24d6582a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page