Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader and writer — Arrow-native, zero C dependencies.

Features

  • Blazing fast read and write for SPSS .sav (bytecode) and .zsav (zlib) files
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels, and more
  • Lazy reader via scan_sav() — Polars LazyFrame with projection and row limit pushdown
  • Pure Rust with a native Python API — native Arrow integration, no C dependencies
  • Benchmarked up to 3–10x faster reads and 4–20x faster writes compared to current popular SPSS I/O libraries

Installation

Python:

uv add ambers

Rust:

cargo add ambers

Python

import ambers as am
import polars as pl

# Eager read — returns SavFile with .data and .meta
sav = am.read_sav("survey.sav")
df, meta = sav.data, sav.meta

# Lazy read — .data is a Polars LazyFrame
sav = am.scan_sav("survey.sav")
lf, meta = sav.data, sav.meta
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_meta("survey.sav")

# Write back — roundtrip with full metadata
sav = am.read_sav("input.sav")
df, meta = sav.data, sav.meta
df = df.filter(pl.col("age") > 18)
am.write_sav(df, "filtered.sav", meta=meta)                        # bytecode (default for .sav)
am.write_sav(df, "compressed.zsav", meta=meta)                     # zlib (default for .zsav)
am.write_sav(df, "raw.sav", meta=meta, compression="uncompressed") # no compression
am.write_sav(df, "fast.zsav", meta=meta, compression_level=1)      # fast zlib

# From scratch — metadata is optional, inferred from DataFrame schema
am.write_sav(df, "new.sav")

# Apply value labels — replace codes with labels for export/analysis
df, meta = sav.data, sav.meta
labeled = am.apply_labels(df, meta)                          # Enum dtype (ordered, strict)
labeled.write_excel("survey.xlsx")                            # Enum auto-casts to String
labeled = am.apply_labels(df, meta, output="string")          # String dtype for export
labeled = am.apply_labels(df, meta, output="enum_null")       # Enum, unmapped → null
labeled = am.apply_labels(df, meta, exclude=["weight", "id"])  # skip specific columns

# Apply missing values — nullify SPSS user-defined missing codes
clean = am.apply_missing(df, meta)                             # all columns with specs
clean = am.apply_missing(df, meta, columns=["Q1", "Q2"])       # specific columns only
clean = am.apply_missing(df, meta, exclude=["age"])            # skip specific columns

.sav uses bytecode compression by default, .zsav uses zlib. Pass compression= to override ("uncompressed", "bytecode", "zlib"). Pass meta= to preserve all metadata from a prior read_sav(), or omit it to infer formats from the DataFrame.

SavFile

read_sav() and scan_sav() return a SavFile object with file-level metadata alongside the data:

>>> sav = am.read_sav("survey_2025.sav")
>>> sav
┌─ SavFile ──────────────────────────┐
│ Data        DataFrame (polars)     │
│ Shape       22,070 rows x 677 cols │
│ Source      survey_2025.sav        │
│ File size   146.5 MB, bytecode     │
│ Read time   0.286s                 │
└────────────────────────────────────┘
Attribute Type Description
sav.data DataFrame or LazyFrame The data (eager from read_sav, lazy from scan_sav)
sav.meta SpssMetadata All variable metadata (labels, formats, value labels, etc.)
sav.source str | None Source file path
sav.shape tuple[int, int] | None (n_rows, n_cols)
sav.file_size int | None File size in bytes
sav.read_time float | None Wall-clock read time in seconds
sav.compression str "uncompressed", "bytecode", or "zlib"

For scan_sav(), read_time measures metadata/schema reading only (not lazy collection).

apply_labels

Replace numeric/string codes with their SPSS value labels. By default produces Polars Enum columns that preserve SPSS definition order — crucial for Likert scales and survey analysis.

sav = am.read_sav("survey.sav")
df, meta = sav.data, sav.meta

# Default: Enum output, strict validation
labeled = am.apply_labels(df, meta)
labeled.group_by("satisfaction").agg(pl.len())  # sorted by definition order
labeled.write_excel("survey.xlsx")              # Enum auto-casts to String

# String output for quick export
labeled = am.apply_labels(df, meta, output="string")

# Enum output with unmapped values as null
labeled = am.apply_labels(df, meta, output="enum_null")
output= Dtype Unmapped values Best for
"enum" (default) pl.Enum (ordered) Error Analysis — strict, validated categories
"string" pl.String Stringify (3.0"3") Export — readable text for Excel/CSV
"enum_null" pl.Enum (ordered) Null Analysis — exclude unknowns from base

Numeric columns without value labels are skipped. String columns always pass through unmapped text. See apply_labels.md for full documentation.

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.role("Q1") Variable role ("input", "target", "both", "none", "partition", "split")
meta.attribute("Q1", "CustomNote") Custom attribute values (list[str] or None)
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Metadata Fields

All fields returned by the reader. Fields marked Write are preserved when passed via meta= to write_sav(). Read-only fields are set automatically (encoding, timestamps, row/column counts, etc.).

Note: This is a first pass — field names and behavior may change without warning in future releases.

Field Read Write Type
file_label yes yes str
file_format yes str
file_encoding yes str
creation_time yes str
compression yes str
number_columns yes int
number_rows yes int | None
weight_variable yes yes str | None
notes yes yes list[str]
variable_names yes list[str]
variable_labels yes yes dict[str, str]
variable_value_labels yes yes dict[str, dict[float|str, str]]
variable_formats yes yes dict[str, str]
variable_measures yes yes dict[str, str]
variable_alignments yes yes dict[str, str]
variable_storage_widths yes dict[str, int]
variable_display_widths yes yes dict[str, int]
variable_roles yes yes dict[str, str]
variable_missing_values yes yes dict[str, dict]
variable_attributes yes yes dict[str, dict[str, list[str]]]
mr_sets yes yes dict[str, dict]
arrow_data_types yes dict[str, str]

Creating metadata from scratch:

meta = am.SpssMetadata(
    file_label="Customer Survey 2026",
    variable_labels={"Q1": "Satisfaction", "Q2": "Loyalty"},
    variable_value_labels={"Q1": {1: "Low", 5: "High"}},
    variable_measures={"Q1": "ordinal", "Q2": "nominal"},
)
am.write_sav(df, "output.sav", meta=meta)

Modifying existing metadata (from read_sav() or a previously created SpssMetadata):

# .update() — bulk update multiple fields at once, merges dicts, replaces scalars
meta2 = meta.update(
    file_label="Updated Survey",
    variable_labels={"Q3": "NPS"},        # Q1/Q2 labels preserved, Q3 added
    variable_measures={"Q3": "scale"},
)

# .with_*() — chainable single-field setters, with full IDE autocomplete and type hints
meta3 = (meta
    .with_file_label("Updated Survey")
    .with_variable_labels({"Q3": "NPS"})
    .with_variable_measures({"Q3": "scale"})
)

Immutability: SpssMetadata is immutable. .update() and .with_*() always return a new instance — the original is never modified. Assign to a new variable if you need to keep both copies.

Update logic:

  • Dict fields (labels, formats, measures, etc.) merge as an overlay — new keys are added, existing keys are overwritten, all other keys are preserved. Pass {key: None} to remove a key.
  • Scalar fields (file_label, weight_variable) and notes are replaced entirely.
  • Column renames are not tracked. If you rename "Q1" to "Q1a" in your DataFrame, metadata for "Q1" does not carry over — you must explicitly provide metadata for "Q1a".

See metadata.md for the full API reference including update logic details, missing values, MR sets, and validation rules.

SPSS tip: Custom variable attributes are not shown in SPSS's Variable View by default. Go to View > Customize Variable View and click OK, or run DISPLAY ATTRIBUTES in SPSS syntax.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Best of 3–5 runs (with warmup) on Windows 11, Python 3.13, Intel Core Ultra 9 275HX (24C), 64 GB RAM (6400 MT/s).

File Size Rows Cols ambers polars_readstat pyreadstat vs prs vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 < 0.01s < 0.01s 0.011s
test_2 (bytecode) 147 MB 22,070 677 0.286s 0.897s 3.524s 3.1x 12x
test_3 (uncompressed) 1.1 GB 79,066 915 0.322s 1.150s 4.918s 3.6x 15x
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.003s 0.012s 1.5x 6x
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.003s 0.016s 1.5x 8x
test_6 (uncompressed) 5.4 GB 395,330 916 1.600s 1.752s 25.214s 1.1x 16x
  • Faster than polars_readstat on all tested files — 1.1–3.6x faster
  • 6–16x faster than pyreadstat across all file sizes
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

Write

write_sav() writes a Polars DataFrame + metadata back to .sav (bytecode) or .zsav (zlib). Best of 5 runs on the same machine.

File Size Rows Cols Mode ambers pyreadstat Speedup
test_1 (bytecode) 0.2 MB 1,500 75 .sav 0.001s 0.019s 13x
.zsav 0.004s 0.025s 6x
test_2 (bytecode) 147 MB 22,070 677 .sav 0.539s 3.622s 7x
.zsav 0.386s 4.174s 11x
test_3 (uncompressed) 1.1 GB 79,066 915 .sav 0.439s 13.963s 32x
.zsav 0.436s 17.991s 41x
test_4 (uncompressed) 0.6 MB 201 158 .sav 0.002s 0.027s 16x
.zsav 0.004s 0.035s 9x
test_5 (uncompressed) 0.6 MB 203 136 .sav 0.001s 0.023s 17x
.zsav 0.003s 0.027s 9x
test_6 (uncompressed) 5.4 GB 395,330 916 .sav 2.511s 84.836s 34x
.zsav 2.255s 90.499s 40x
  • 6–41x faster than pyreadstat on writes across all files and compression modes
  • Full metadata roundtrip: variable labels, value labels, missing values, MR sets, display properties
  • Bytecode (.sav) and zlib (.zsav) compression

Roadmap

  • apply_missing_values() — apply SPSS missing value definitions to DataFrames
  • meta.validate(df) — validate metadata against a DataFrame
  • Codebook export — generate variable documentation from metadata
  • Continued I/O performance optimization
  • Currently Polars-only — pandas/other DataFrame libraries may be added later

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.4.1.tar.gz (162.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.4.1-cp314-cp314-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.4.1-cp314-cp314-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.4.1-cp314-cp314-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.4.1-cp313-cp313-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.4.1-cp313-cp313-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.4.1-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.4.1-cp312-cp312-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.4.1-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.4.1.tar.gz.

File metadata

  • Download URL: ambers-0.4.1.tar.gz
  • Upload date:
  • Size: 162.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ambers-0.4.1.tar.gz
Algorithm Hash digest
SHA256 53d9acadf0154ae78548af899ca3787db710a8d76dc40323fd63d1394cca64e1
MD5 3ba1641795cc81b41d2945c13334ad11
BLAKE2b-256 bea01f8c87848cc199541b98a29337ce210390a49d2e3534ce1ddb1f454d4eac

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.4.1.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.4.1-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.4.1-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ambers-0.4.1-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 a54863ad1f818388d8815ff035ae2cb03aeafc31a981ebf54e52ccdbc938dfca
MD5 1a4ac21c358fc18f726f28474cfeafc1
BLAKE2b-256 13061a3e65ac15dbf48f2a8203a5e3f116fd7a3d977bd9755544257bd62f55d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.4.1-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.4.1-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.4.1-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 af1eb1fab060cd4ab7ba5d3d34ea2408d2706e26bde5a0bee39fb43938a3687b
MD5 d5bceb25f9f0f5ca685691de929f817a
BLAKE2b-256 507dbd84b2791893f4669b5cafa9a2e5610c45d3d3154530727cab17eb446ec3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.4.1-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.4.1-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.4.1-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4084705b8ebacfe67e6ecfb8f799c9ef8aadca2609ef2e9d6ec46a21ef737343
MD5 e34d1dbf3ec70644be679d1c74694106
BLAKE2b-256 307894fce3a2739687c91e6692c0ba0bcc8a1d4674eeaa0ffa6dfc8a23f4cf8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.4.1-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.4.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.4.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ambers-0.4.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 902012299b219aebc507b95ce9b614b5da84ce949bfcb778ab7aa6c5a0af0c2f
MD5 f56ea1854b798f4de5d98231d0333e1b
BLAKE2b-256 2f6c60fe159bc10e849a2c7ff933e2dcd6e477ff5b756532e3a64d389ab836e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.4.1-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.4.1-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.4.1-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 62a05156724e32ef2e47c7ae4d5c6a81112ab73f2e02d398e23be95b1d2b65f3
MD5 bbc94a2da338139f7be07accf72d65b3
BLAKE2b-256 16497fb410a2ee3cf7f49a41d458fe733e1c30c42f1dba638e15637bb7884add

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.4.1-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.4.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.4.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ebbc6abd25252f26db30f7e9fcdab62d8785373d35bb69e075841f8fd2cd85e5
MD5 4fdedb1aa0544f91ec26b1af602a2de2
BLAKE2b-256 91facb463b681fb310dec20713c231f007ad79bb92ef0fb9b17373cee45d63f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.4.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.4.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.4.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ambers-0.4.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 9fe3a8a39548b97c41a8192122ddfb9685002c691e15d6d6868e22ca09b38834
MD5 f63b78e4ee3cce0b8226b61bfd10658a
BLAKE2b-256 c7999e098a3cf71aaf4c370faf5d40a5de500a9fef70558b20a71ef9a46825fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.4.1-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 5a7b8a8995feb38f52565c79b5f06950dc9cd4b93e8180393a915796f80c78d1
MD5 6f37bc709dd7a7d1b08edf2267addea5
BLAKE2b-256 33e3307b8e77dd6935e049a14c05ab9348586432a379962bd9f1314ad107d20c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.4.1-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.4.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.4.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 238aa44b3883bc2d40e8868c09f875fc6b8e70561f1a4c0e9453a75199f1aae9
MD5 c57e46007283260df8949274b6618d2c
BLAKE2b-256 28781378cf87ec63fd327048327ea572b3f9be27cdb9bf8e817f648012e96a80

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.4.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page