Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader and writer — Arrow-native, zero C dependencies.

Features

  • Blazing fast read and write for SPSS .sav (bytecode) and .zsav (zlib) files
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels, and more
  • Lazy reader via scan_sav() — Polars LazyFrame with projection and row limit pushdown
  • Pure Rust with a native Python API — native Arrow integration, no C dependencies
  • Benchmarked up to 3–10x faster reads and 4–20x faster writes compared to current popular SPSS I/O libraries

Installation

Python:

uv add ambers

Rust:

cargo add ambers

Python

import ambers as am
import polars as pl

# Eager read — data + metadata
df, meta = am.read_sav("survey.sav")

# Lazy read — returns Polars LazyFrame
lf, meta = am.scan_sav("survey.sav")
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_metadata("survey.sav")

# Write back — roundtrip with full metadata
df = df.filter(pl.col("age") > 18)
am.write_sav(df, "filtered.sav", meta=meta)                        # bytecode (default for .sav)
am.write_sav(df, "compressed.zsav", meta=meta)                     # zlib (default for .zsav)
am.write_sav(df, "raw.sav", meta=meta, compression="uncompressed") # no compression
am.write_sav(df, "fast.zsav", meta=meta, compression_level=1)      # fast zlib

# From scratch — metadata is optional, inferred from DataFrame schema
am.write_sav(df, "new.sav")

.sav uses bytecode compression by default, .zsav uses zlib. Pass compression= to override ("uncompressed", "bytecode", "zlib"). Pass meta= to preserve all metadata from a prior read_sav(), or omit it to infer formats from the DataFrame.

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.role("Q1") Variable role ("input", "target", "both", "none", "partition", "split")
meta.attribute("Q1", "CustomNote") Custom attribute values (list[str] or None)
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Metadata Fields

All fields returned by the reader. Fields marked Write are preserved when passed via meta= to write_sav(). Read-only fields are set automatically (encoding, timestamps, row/column counts, etc.).

Note: This is a first pass — field names and behavior may change without warning in future releases.

Field Read Write Type
file_label yes yes str
file_format yes str
file_encoding yes str
creation_time yes str
compression yes str
number_columns yes int
number_rows yes int | None
weight_variable yes yes str | None
notes yes yes list[str]
variable_names yes list[str]
variable_labels yes yes dict[str, str]
variable_value_labels yes yes dict[str, dict[float|str, str]]
variable_formats yes yes dict[str, str]
variable_measures yes yes dict[str, str]
variable_alignments yes yes dict[str, str]
variable_storage_widths yes dict[str, int]
variable_display_widths yes yes dict[str, int]
variable_roles yes yes dict[str, str]
variable_missing_values yes yes dict[str, dict]
variable_attributes yes yes dict[str, dict[str, list[str]]]
mr_sets yes yes dict[str, dict]
arrow_data_types yes dict[str, str]

Creating metadata from scratch:

meta = am.SpssMetadata(
    file_label="Customer Survey 2026",
    variable_labels={"Q1": "Satisfaction", "Q2": "Loyalty"},
    variable_value_labels={"Q1": {1: "Low", 5: "High"}},
    variable_measures={"Q1": "ordinal", "Q2": "nominal"},
)
am.write_sav(df, "output.sav", meta=meta)

Modifying existing metadata (from read_sav() or a previously created SpssMetadata):

# .update() — bulk update multiple fields at once, merges dicts, replaces scalars
meta2 = meta.update(
    file_label="Updated Survey",
    variable_labels={"Q3": "NPS"},        # Q1/Q2 labels preserved, Q3 added
    variable_measures={"Q3": "scale"},
)

# .with_*() — chainable single-field setters, with full IDE autocomplete and type hints
meta3 = (meta
    .with_file_label("Updated Survey")
    .with_variable_labels({"Q3": "NPS"})
    .with_variable_measures({"Q3": "scale"})
)

Immutability: SpssMetadata is immutable. .update() and .with_*() always return a new instance — the original is never modified. Assign to a new variable if you need to keep both copies.

Update logic:

  • Dict fields (labels, formats, measures, etc.) merge as an overlay — new keys are added, existing keys are overwritten, all other keys are preserved. Pass {key: None} to remove a key.
  • Scalar fields (file_label, weight_variable) and notes are replaced entirely.
  • Column renames are not tracked. If you rename "Q1" to "Q1a" in your DataFrame, metadata for "Q1" does not carry over — you must explicitly provide metadata for "Q1a".

See metadata.md for the full API reference including update logic details, missing values, MR sets, and validation rules.

SPSS tip: Custom variable attributes are not shown in SPSS's Variable View by default. Go to View > Customize Variable View and click OK, or run DISPLAY ATTRIBUTES in SPSS syntax.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Best of 3–5 runs (with warmup) on Windows 11, Python 3.13, Intel Core Ultra 9 275HX (24C), 64 GB RAM (6400 MT/s).

File Size Rows Cols ambers polars_readstat pyreadstat vs prs vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 < 0.01s < 0.01s 0.011s
test_2 (bytecode) 147 MB 22,070 677 0.286s 0.897s 3.524s 3.1x 12x
test_3 (uncompressed) 1.1 GB 79,066 915 0.322s 1.150s 4.918s 3.6x 15x
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.003s 0.012s 1.5x 6x
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.003s 0.016s 1.5x 8x
test_6 (uncompressed) 5.4 GB 395,330 916 1.600s 1.752s 25.214s 1.1x 16x
  • Faster than polars_readstat on all tested files — 1.1–3.6x faster
  • 6–16x faster than pyreadstat across all file sizes
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

Write

write_sav() writes a Polars DataFrame + metadata back to .sav (bytecode) or .zsav (zlib). Best of 5 runs on the same machine.

File Size Rows Cols Mode ambers pyreadstat Speedup
test_1 (bytecode) 0.2 MB 1,500 75 .sav 0.001s 0.019s 13x
.zsav 0.004s 0.025s 6x
test_2 (bytecode) 147 MB 22,070 677 .sav 0.539s 3.622s 7x
.zsav 0.386s 4.174s 11x
test_3 (uncompressed) 1.1 GB 79,066 915 .sav 0.439s 13.963s 32x
.zsav 0.436s 17.991s 41x
test_4 (uncompressed) 0.6 MB 201 158 .sav 0.002s 0.027s 16x
.zsav 0.004s 0.035s 9x
test_5 (uncompressed) 0.6 MB 203 136 .sav 0.001s 0.023s 17x
.zsav 0.003s 0.027s 9x
test_6 (uncompressed) 5.4 GB 395,330 916 .sav 2.511s 84.836s 34x
.zsav 2.255s 90.499s 40x
  • 6–41x faster than pyreadstat on writes across all files and compression modes
  • Full metadata roundtrip: variable labels, value labels, missing values, MR sets, display properties
  • Bytecode (.sav) and zlib (.zsav) compression

Roadmap

  • Continued I/O performance optimization
  • Expanded SPSS metadata field coverage
  • Rich metadata manipulation — add, update, merge, and remove metadata programmatically
  • Individual metadata field overrides in write_sav() — pass variable_labels=, variable_value_labels=, etc. alongside meta= to selectively override fields
  • Currently supports read and write with Polars DataFrames (eager and lazy) — extending to pandas, Narwhals, DuckDB, and others

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.3.7.tar.gz (143.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.3.7-cp314-cp314-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.3.7-cp314-cp314-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.3.7-cp314-cp314-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.3.7-cp313-cp313-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.3.7-cp313-cp313-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.3.7-cp313-cp313-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.3.7-cp312-cp312-win_amd64.whl (1.0 MB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.3.7-cp312-cp312-manylinux_2_34_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.3.7-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.3.7.tar.gz.

File metadata

  • Download URL: ambers-0.3.7.tar.gz
  • Upload date:
  • Size: 143.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.7.tar.gz
Algorithm Hash digest
SHA256 28cf98fb848de4af3c066fa4c7fc01409d570dcfcbd720f2c3a6a1e10d02f4b7
MD5 ef423f2159bffadd2657b8c14583f8c0
BLAKE2b-256 2a33f0cf45dce0d0d75c2979ce6c2b2e1c43c221cf5fc82e742ad9dadd090435

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.7.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.7-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.7-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.7-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 25e41877d6d6cd08714e3f24e20a0bffb634e08dfee3272892ea46c6cd330370
MD5 1c0773ee7434510cee6b67b39a1cd118
BLAKE2b-256 4b066c770ab0b8191cab707d2470bc8cb9d2538b2377482bcd2f9b6ca00edf87

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.7-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.7-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.7-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1321d50756f3aa06e36911b4cf8c0052d5723ff10d6ac8d05d18431c857962b6
MD5 e050059312e2f635da3db6da55b27ac5
BLAKE2b-256 e46eb5997a5d752c7ee200673bc72e3157589ff86a079bbe1521cdac37263fb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.7-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.7-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.7-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 93cc4b81e33f157608d556b8723199217c93018d32c1bf7adc7a6cec38d90074
MD5 04414644d53b37094a4b94435253ac90
BLAKE2b-256 3446984e5f83cc885f59bfe69ba385f04b80d6e7f3d91e2eeb6b30a787164c23

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.7-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.7-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.7-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.7-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 52db885e8feb29ed7a1295686f57e239116d4c55eed19340977a5839b0c9f370
MD5 7e4cac6cdfe64feff823ef4d6dc11788
BLAKE2b-256 fe68f577df786f8a2dd3416b4aae324de868f03124dbd3a0fc93ec7e7bdc9bd4

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.7-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.7-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.7-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 74423cdc25b13ae59935857e461e5d6176efb11343303eeae17d1e78a951ec7a
MD5 b13e6a5e3e8c1e79a6f31fd6731009a1
BLAKE2b-256 b22c5c5255aa394784765361de0b9e825442ad37b7bfcbcd3b5a1fcdb36d1f0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.7-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.7-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.7-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f26acc0f41e66a89378fc9fa2ffc336bb9412daa45d191dd342a281af42844c7
MD5 69970d97effa00a026965f7d690da03d
BLAKE2b-256 3e7f14ccdd5589620a34928df7959cefdc19c734c41e93df6b68b0c438c6078b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.7-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.7-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.3.7-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.0 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.3.7-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 1c47c488d1f0eb19d25a23b447693644e3de446b34f09dd5c71938708803377a
MD5 24dce57c03da4089074fea256ae2e45d
BLAKE2b-256 11079f643b8a58fad6686cf50b349635ba61b79393c45632e75a11572875a977

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.7-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.7-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.3.7-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 035fcd9f3a6fc75511054c7e92d533c08a8f84ca349bdffb0282e4356954f0ea
MD5 1f0a89d03a563e93b744997a76a56060
BLAKE2b-256 9faf5499170078bb9f8e2605b4a634360f6bedb8103cbd091c3cc5e4c676a215

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.7-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.3.7-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.3.7-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8883a976a37518fce451e116fa07c3f85b0d8d4b211fd903b2b8dc23827fb8f8
MD5 15485282d04cfa811e6010b88e230aca
BLAKE2b-256 97a4edd8b4908cd0f7e0e52d66b0ff090e21f9c3a3451c1f335ed4d98b963f59

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.3.7-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page