Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader and writer — Arrow-native, zero C dependencies.

Features

  • Blazing fast read and write for SPSS .sav (bytecode) and .zsav (zlib) files
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels, and more
  • Lazy reader via scan_sav() — Polars LazyFrame with projection and row limit pushdown
  • Pure Rust with a native Python API — native Arrow integration, no C dependencies
  • Benchmarked up to 3–10x faster reads and 4–20x faster writes compared to current popular SPSS I/O libraries

Installation

Python:

uv add ambers

Rust:

cargo add ambers

Python

import ambers as am
import polars as pl

# Eager read — returns SavFile with .data and .meta
sav = am.read_sav("survey.sav")
df, meta = sav.data, sav.meta

# Lazy read — .data is a Polars LazyFrame
sav = am.scan_sav("survey.sav")
lf, meta = sav.data, sav.meta
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_meta("survey.sav")

# Write back — roundtrip with full metadata
sav = am.read_sav("input.sav")
df, meta = sav.data, sav.meta
df = df.filter(pl.col("age") > 18)
am.write_sav(df, "filtered.sav", meta=meta)                        # bytecode (default for .sav)
am.write_sav(df, "compressed.zsav", meta=meta)                     # zlib (default for .zsav)
am.write_sav(df, "raw.sav", meta=meta, compression="uncompressed") # no compression
am.write_sav(df, "fast.zsav", meta=meta, compression_level=1)      # fast zlib

# From scratch — metadata is optional, inferred from DataFrame schema
am.write_sav(df, "new.sav")

.sav uses bytecode compression by default, .zsav uses zlib. Pass compression= to override ("uncompressed", "bytecode", "zlib"). Pass meta= to preserve all metadata from a prior read_sav(), or omit it to infer formats from the DataFrame.

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.role("Q1") Variable role ("input", "target", "both", "none", "partition", "split")
meta.attribute("Q1", "CustomNote") Custom attribute values (list[str] or None)
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Metadata Fields

All fields returned by the reader. Fields marked Write are preserved when passed via meta= to write_sav(). Read-only fields are set automatically (encoding, timestamps, row/column counts, etc.).

Note: This is a first pass — field names and behavior may change without warning in future releases.

Field Read Write Type
file_label yes yes str
file_format yes str
file_encoding yes str
creation_time yes str
compression yes str
number_columns yes int
number_rows yes int | None
weight_variable yes yes str | None
notes yes yes list[str]
variable_names yes list[str]
variable_labels yes yes dict[str, str]
variable_value_labels yes yes dict[str, dict[float|str, str]]
variable_formats yes yes dict[str, str]
variable_measures yes yes dict[str, str]
variable_alignments yes yes dict[str, str]
variable_storage_widths yes dict[str, int]
variable_display_widths yes yes dict[str, int]
variable_roles yes yes dict[str, str]
variable_missing_values yes yes dict[str, dict]
variable_attributes yes yes dict[str, dict[str, list[str]]]
mr_sets yes yes dict[str, dict]
arrow_data_types yes dict[str, str]

Creating metadata from scratch:

meta = am.SpssMetadata(
    file_label="Customer Survey 2026",
    variable_labels={"Q1": "Satisfaction", "Q2": "Loyalty"},
    variable_value_labels={"Q1": {1: "Low", 5: "High"}},
    variable_measures={"Q1": "ordinal", "Q2": "nominal"},
)
am.write_sav(df, "output.sav", meta=meta)

Modifying existing metadata (from read_sav() or a previously created SpssMetadata):

# .update() — bulk update multiple fields at once, merges dicts, replaces scalars
meta2 = meta.update(
    file_label="Updated Survey",
    variable_labels={"Q3": "NPS"},        # Q1/Q2 labels preserved, Q3 added
    variable_measures={"Q3": "scale"},
)

# .with_*() — chainable single-field setters, with full IDE autocomplete and type hints
meta3 = (meta
    .with_file_label("Updated Survey")
    .with_variable_labels({"Q3": "NPS"})
    .with_variable_measures({"Q3": "scale"})
)

Immutability: SpssMetadata is immutable. .update() and .with_*() always return a new instance — the original is never modified. Assign to a new variable if you need to keep both copies.

Update logic:

  • Dict fields (labels, formats, measures, etc.) merge as an overlay — new keys are added, existing keys are overwritten, all other keys are preserved. Pass {key: None} to remove a key.
  • Scalar fields (file_label, weight_variable) and notes are replaced entirely.
  • Column renames are not tracked. If you rename "Q1" to "Q1a" in your DataFrame, metadata for "Q1" does not carry over — you must explicitly provide metadata for "Q1a".

See metadata.md for the full API reference including update logic details, missing values, MR sets, and validation rules.

SPSS tip: Custom variable attributes are not shown in SPSS's Variable View by default. Go to View > Customize Variable View and click OK, or run DISPLAY ATTRIBUTES in SPSS syntax.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Best of 3–5 runs (with warmup) on Windows 11, Python 3.13, Intel Core Ultra 9 275HX (24C), 64 GB RAM (6400 MT/s).

File Size Rows Cols ambers polars_readstat pyreadstat vs prs vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 < 0.01s < 0.01s 0.011s
test_2 (bytecode) 147 MB 22,070 677 0.286s 0.897s 3.524s 3.1x 12x
test_3 (uncompressed) 1.1 GB 79,066 915 0.322s 1.150s 4.918s 3.6x 15x
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.003s 0.012s 1.5x 6x
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.003s 0.016s 1.5x 8x
test_6 (uncompressed) 5.4 GB 395,330 916 1.600s 1.752s 25.214s 1.1x 16x
  • Faster than polars_readstat on all tested files — 1.1–3.6x faster
  • 6–16x faster than pyreadstat across all file sizes
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

Write

write_sav() writes a Polars DataFrame + metadata back to .sav (bytecode) or .zsav (zlib). Best of 5 runs on the same machine.

File Size Rows Cols Mode ambers pyreadstat Speedup
test_1 (bytecode) 0.2 MB 1,500 75 .sav 0.001s 0.019s 13x
.zsav 0.004s 0.025s 6x
test_2 (bytecode) 147 MB 22,070 677 .sav 0.539s 3.622s 7x
.zsav 0.386s 4.174s 11x
test_3 (uncompressed) 1.1 GB 79,066 915 .sav 0.439s 13.963s 32x
.zsav 0.436s 17.991s 41x
test_4 (uncompressed) 0.6 MB 201 158 .sav 0.002s 0.027s 16x
.zsav 0.004s 0.035s 9x
test_5 (uncompressed) 0.6 MB 203 136 .sav 0.001s 0.023s 17x
.zsav 0.003s 0.027s 9x
test_6 (uncompressed) 5.4 GB 395,330 916 .sav 2.511s 84.836s 34x
.zsav 2.255s 90.499s 40x
  • 6–41x faster than pyreadstat on writes across all files and compression modes
  • Full metadata roundtrip: variable labels, value labels, missing values, MR sets, display properties
  • Bytecode (.sav) and zlib (.zsav) compression

Roadmap

  • Continued I/O performance optimization
  • Expanded SPSS metadata field coverage
  • Rich metadata manipulation — add, update, merge, and remove metadata programmatically
  • Individual metadata field overrides in write_sav() — pass variable_labels=, variable_value_labels=, etc. alongside meta= to selectively override fields
  • Currently supports read and write with Polars DataFrames (eager and lazy) — extending to pandas, Narwhals, DuckDB, and others

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.3.9.tar.gz (145.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.3.9-cp314-cp314-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.3.9-cp314-cp314-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.3.9-cp314-cp314-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.3.9-cp313-cp313-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.3.9-cp313-cp313-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.3.9-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.3.9-cp312-cp312-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.3.9-cp312-cp312-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.3.9-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.3.9.tar.gz.

File metadata

  • Download URL: ambers-0.3.9.tar.gz
  • Upload date:
  • Size: 145.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.9.tar.gz
Algorithm Hash digest
SHA256 04a8e5f908da5a6d914021dda690408c4b8dc4286b87ed7a70fcfba305ac2f61
MD5 4d19bd241a1f28928375535cc8d3176d
BLAKE2b-256 c263f79b29f1a53569d9e48a53e267fe1e9ded4de885cfce0f3ac1081b16076a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.9.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.9-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.9-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.9-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 badac5fda92310f091cfab694329ae064c601adb719372c5f48aa130eb7cb921
MD5 f9cad0fbc843d1fad238febe0474649b
BLAKE2b-256 dffbe3fa66a78645e8e1c6b044cb29d756644f3c20d9e6ab2f6fc68dc2a5c7f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.9-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.9-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.9-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1c6afb6c3f82e7dcf89b34dd6808e1d3177911b774f28c59d48be17f7174a4f7
MD5 e9f685a45994d18979987162623863a7
BLAKE2b-256 25f98275026df7ec4dca829982af98e19542f9c34a0424ce6895cde33888fad7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.9-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.9-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.9-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ef9d097598659ea422f15b8492fda1739059634e37eead9cc81c67207a82d40a
MD5 4ae0e2d7fb17615f36877780e6993b87
BLAKE2b-256 0d61264d07100d1e1d6ce9c3e521848b3203828cb11c3e63acbff7a9b76f96e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.9-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.9-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.9-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.9-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 9b37565354491c3a8aa638fcfad66669bb58fee67dad8c0c93599fdef75c8a43
MD5 946024e15be7f9f1ba9102b2bc8f4e65
BLAKE2b-256 cb488e2bddac59c539edcc7d796c15cfab93ba41de28fce14fc76cb00c8a14e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.9-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.9-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.9-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d64ad6f807f8afd112c9632bf20a29a59f48f3a37cbb2343b7c6c781fb642962
MD5 4e623103b4be4ff51e77841c2176c65c
BLAKE2b-256 a55868caedb26d4478441e79604ce25cff2739aa12487ced79f6f60309c33dc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.9-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.9-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.9-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 01c217de35994343ccd8e9c8f80f460c1bb449907a53c7a595649c3a0cdd66d6
MD5 961ee1f8edcdd357807cd6c89780ad55
BLAKE2b-256 0933e1a388bcb2e4dd77c74da0073e625c4bfb8eb1ebcea6f5c038e4570a1530

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.9-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.9-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.9-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.9-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 063bcb4d8a49064452b58075d77d898ae1348600b7eb2e7b3dc3c008542a96ed
MD5 7a2f76012281a7216e426dc655a61eb6
BLAKE2b-256 030cf035476fd57b606d737a4e80c6f807a8f37116f1dc60d904ffa44fb7036c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.9-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.9-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.9-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 674cfa21c30a102d547617398f723cef54f6172c4c4535ee9eb1b713a89acf07
MD5 99f954e0428567b518c2536654561e24
BLAKE2b-256 9da326f3f6a0e2548372fef2f52dced040c9ba4eaed53a99080bb129178fcdd9

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.9-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.9-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.9-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e1c8166ffc7953724ab67ce8d8b5e4f14a37e060f50efc3b516399936b0f7661
MD5 d96aafc34b6f5b083e47649968b611fb
BLAKE2b-256 680d2eee3462b562a30b2fa3d23d817d56ca4fcc6bf9ee75af2b3a240d413f79

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.9-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page