Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader and writer — Arrow-native, zero C dependencies.

Features

  • Blazing fast read and write for SPSS .sav (bytecode) and .zsav (zlib) files
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels, and more
  • Lazy reader via scan_sav() — Polars LazyFrame with projection and row limit pushdown
  • Pure Rust with a native Python API — native Arrow integration, no C dependencies
  • Benchmarked up to 3–10x faster reads and 4–20x faster writes compared to current popular SPSS I/O libraries

Installation

Python:

uv add ambers

Rust:

cargo add ambers

Python

import ambers as am
import polars as pl

# Eager read — data + metadata
df, meta = am.read_sav("survey.sav")

# Lazy read — returns Polars LazyFrame
lf, meta = am.scan_sav("survey.sav")
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_metadata("survey.sav")

# Write back — roundtrip with full metadata
df = df.filter(pl.col("age") > 18)
am.write_sav(df, "filtered.sav", meta=meta)

# Write as .zsav (zlib compressed)
am.write_sav(df, "compressed.zsav", meta=meta)

# From scratch — metadata is optional, inferred from DataFrame schema
am.write_sav(df, "new.sav")

Use .sav for bytecode compression (default), .zsav for zlib compression. Pass meta= to preserve all metadata from a prior read_sav(), or omit it to infer formats from the DataFrame. Individual writable fields (e.g., variable_labels, variable_value_labels) can also be passed directly as keyword arguments for fine-grained control.

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.role("Q1") Variable role ("input", "target", "both", "none", "partition", "split")
meta.attribute("Q1", "CustomNote") Custom attribute values (list[str] or None)
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Metadata Fields

All fields returned by the reader. Fields marked Write are preserved when passed via meta= to write_sav(). Read-only fields are set automatically (encoding, timestamps, row/column counts, etc.).

Note: This is a first pass — field names and behavior may change without warning in future releases.

Field Read Write Type
variable_names yes yes list[str]
variable_labels yes yes dict[str, str]
variable_value_labels yes yes dict[str, dict[float|str, str]]
variable_formats yes yes dict[str, str]
variable_measures yes yes dict[str, str]
variable_alignments yes yes dict[str, str]
variable_roles yes yes dict[str, str]
variable_display_widths yes yes dict[str, int]
variable_storage_widths yes yes dict[str, int]
variable_missing_values yes yes dict[str, list[dict]]
variable_attributes yes yes dict[str, dict[str, list[str]]]
weight_variable yes yes str | None
mr_sets yes yes dict[str, dict]
arrow_data_types yes dict[str, str]
file_label yes yes str
file_format yes str
file_encoding yes str
creation_time yes str
number_rows yes int | None
number_columns yes int
compression yes str
notes yes yes list[str]

SPSS tip: Custom variable attributes are not shown in SPSS's Variable View by default. Go to View > Customize Variable View and click OK, or run DISPLAY ATTRIBUTES in SPSS syntax.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Best of 3–5 runs (with warmup) on Windows 11, Python 3.13, 24-core machine.

File Size Rows Cols ambers polars_readstat pyreadstat vs prs vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 < 0.01s < 0.01s 0.011s
test_2 (bytecode) 147 MB 22,070 677 0.286s 0.897s 3.524s 3.1x 12x
test_3 (uncompressed) 1.1 GB 79,066 915 0.322s 1.150s 4.918s 3.6x 15x
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.003s 0.012s 1.5x 6x
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.003s 0.016s 1.5x 8x
test_6 (uncompressed) 5.4 GB 395,330 916 1.600s 1.752s 25.214s 1.1x 16x
  • Faster than polars_readstat on all tested files — 1.1–3.6x faster
  • 6–16x faster than pyreadstat across all file sizes
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

Write

write_sav() writes a Polars DataFrame + metadata back to .sav (bytecode) or .zsav (zlib). Best of 5 runs on the same machine.

File Size Rows Cols Mode ambers pyreadstat Speedup
test_1 (bytecode) 0.2 MB 1,500 75 .sav 0.001s 0.019s 13x
.zsav 0.004s 0.026s 7x
test_2 (bytecode) 147 MB 22,070 677 .sav 0.567s 3.849s 7x
.zsav 1.088s 4.415s 4x
test_3 (uncompressed) 1.1 GB 79,066 915 .sav 0.950s 16.152s 17x
.zsav 1.774s 17.362s 10x
test_6 (uncompressed) 5.4 GB 395,330 916 .sav 5.700s 79.999s 14x
.zsav 8.193s 85.491s 10x
  • 4–20x faster than pyreadstat on writes across all files and compression modes
  • Full metadata roundtrip: variable labels, value labels, missing values, MR sets, display properties
  • Bytecode (.sav) and zlib (.zsav) compression

Roadmap

  • Continued I/O performance optimization
  • Expanded SPSS metadata field coverage
  • Rich metadata manipulation — add, update, merge, and remove metadata programmatically
  • Individual metadata field overrides in write_sav() — pass variable_labels=, variable_value_labels=, etc. alongside meta= to selectively override fields
  • Currently supports read and write with Polars DataFrames (eager and lazy) — extending to pandas, Narwhals, DuckDB, and others

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.3.1.tar.gz (119.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.3.1-cp314-cp314-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.3.1-cp314-cp314-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.3.1-cp314-cp314-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.3.1-cp313-cp313-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.3.1-cp313-cp313-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.3.1-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.3.1-cp312-cp312-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.3.1-cp312-cp312-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.3.1-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.3.1.tar.gz.

File metadata

  • Download URL: ambers-0.3.1.tar.gz
  • Upload date:
  • Size: 119.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.1.tar.gz
Algorithm Hash digest
SHA256 826122bcab1d856ecd22866a104782f9ef6254a98d96de319d10cd0d012b1bed
MD5 0f3784adbfcfb445f5c11b579139a1bb
BLAKE2b-256 261df99a47dd3a80bba65b170f2a18faffc1c7b9f88733513f0c606586fa2042

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.1.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.1-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.1-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.1-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 5d0954ca2c7b7cf94ce7b9d9735497d087357c7017d9caeccf80f1b83cb17c23
MD5 70bddc8981f54fa17e4c999b10596573
BLAKE2b-256 aeba526e30f61b836dabd07f4024510ccabf9fe80e0b75eb3d49fd176c769899

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.1-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.1-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.1-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 5cf7ad6d12bcde3dd48d3f9573e8f62b7ff660f3d4aa1c85a49c0ea3332f38e8
MD5 ed09248e3e762f58fd8c7306c47f553d
BLAKE2b-256 b8ea4e0e3aec8257f025ed2708dd2db70572fd4be8c565ce364c6b856437fbee

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.1-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.1-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.1-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e051433946fcb508ef49f12d7476f13625a6ba586f97766b30bc9ac1f22f80c9
MD5 d85144c71b980e7bafe17018ce2079a9
BLAKE2b-256 71532bbc6043c6be1515b3bb20aee136ee4c46c72a6b31194621ed9332ae6aaf

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.1-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 b4a888673a09de361e4c43ba5d9840b0d231142344aa28af320a04af666ea9d7
MD5 865324d9ce01573f60a6c2cc394c5316
BLAKE2b-256 80520ba10f26db9213026cf5533f3dbd23632c079c538773c81f1588e8fe2253

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.1-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.1-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.1-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0a86e7942993eb9b13656b8c3c4bcbc1934e93f33af56f58ec5f84adda02a03e
MD5 77189c403a9796396a5852fa3519c59b
BLAKE2b-256 d3cdd5e2ea15d3c6cb087d11d77a9f9ed51e3742cf063e999d3461f39d43faed

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.1-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c6c9ae70beca2952be09ffd0e4b1e62ffdca405a6b5b8c94d6b7467ee150c12e
MD5 6b5c003101ba04d5b1a4d42b150e9635
BLAKE2b-256 bddd202c9db310fb9a15ffdd760bf7665b8d11138ef5c12c95286de6ab392b42

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 70c293b2045b4b62cc1c769fec061ba6bbbe00e29cb21bf02720b97ed39ae788
MD5 8843a0d381464e183515a34e9458169a
BLAKE2b-256 d9098883e23337021f772c4ebee85d83cafa0c9e5803ba204c7f69581e71dbf3

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.1-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 931545c46c1d17326e7005edf410b4f82d2dad833b2bf00afba5acd080729cd9
MD5 3d48a66e2325e4211a9ecd7e3308252f
BLAKE2b-256 a74e83d3bcaac6478925337c728d9a2eef8a3eedcde36356d39e328e07f1979e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.1-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3862d6a9a1053bb9e84fef299e89799515978166a3c0b640c1de2abd6f8a30f7
MD5 ba5f4b2380ee819d8824db013b0e65bb
BLAKE2b-256 0db77ec9b7be95e0e9e450d68870388642930cc0a6a8c253380b7df19453da35

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page