Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader — Arrow-native, zero C dependencies.

Features

  • Read .sav (bytecode) and .zsav (zlib) files
  • Arrow RecordBatch output — zero-copy to Polars, DataFusion, DuckDB
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels
  • Lazy reader via scan_sav() — returns Polars LazyFrame with projection and row limit pushdown
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer
  • One of the fastest SPSS readers — up to 2.5x faster than polars_readstat, 5–10x faster than pyreadstat
  • Python + Rust dual API from a single crate

Installation

Python:

pip install ambers

Rust:

cargo add ambers

Quick Start

Python

import ambers as am

# Eager read — data + metadata
df, meta = am.read_sav("survey.sav")

# Lazy read — returns Polars LazyFrame
lf, meta = am.scan_sav("survey.sav")
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_metadata("survey.sav")

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Average of 5 runs on Windows 11, Python 3.13, 24-core machine.

File Size Rows Cols ambers polars_readstat ambers vs prs pyreadstat pyreadstat mp (4w) ambers vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 0.002s 0.004s 2.0x faster 0.010s 0.493s 5.0x faster
test_2 (bytecode) 147 MB 22,070 677 0.812s 0.991s 1.2x faster 3.564s 1.781s 4.4x faster
test_3 (uncompressed) 1.1 GB 79,066 915 0.509s 1.279s 2.5x faster 4.849s 2.764s 9.5x faster
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.004s 2.0x faster 0.018s 0.470s 9.0x faster
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.004s 2.0x faster 0.015s 0.454s 7.5x faster
test_6 (uncompressed) 5.4 GB 395,330 916 2.801s 1.809s 1.5x slower 24.199s 11.718s 8.6x faster
  • vs polars_readstat: faster on 5 of 6 files — 1.2–2.5x faster (test_6 at 5.4 GB is 1.5x slower)
  • vs pyreadstat: 4–10x faster across all file sizes
  • vs pyreadstat multiprocess (4 workers): ambers single-threaded still faster on every file
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

pyreadstat multiprocess returns pandas; timing includes pl.from_pandas() conversion.

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.2.4.tar.gz (68.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.2.4-cp314-cp314-win_amd64.whl (843.4 kB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.2.4-cp314-cp314-manylinux_2_34_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.2.4-cp314-cp314-macosx_11_0_arm64.whl (911.3 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.2.4-cp313-cp313-win_amd64.whl (844.8 kB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.2.4-cp313-cp313-manylinux_2_34_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.2.4-cp313-cp313-macosx_11_0_arm64.whl (911.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.2.4-cp312-cp312-win_amd64.whl (844.9 kB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.2.4-cp312-cp312-manylinux_2_34_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.2.4-cp312-cp312-macosx_11_0_arm64.whl (910.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.2.4.tar.gz.

File metadata

  • Download URL: ambers-0.2.4.tar.gz
  • Upload date:
  • Size: 68.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.2.4.tar.gz
Algorithm Hash digest
SHA256 cd784192ff2803b5db712967d8905a1d66591c7044be2406e459ee7ceb4d70f5
MD5 a7d8ba778227c85859b44e32073d1769
BLAKE2b-256 94c8335f734ea0321926280e95a18e4d4d856d61a068d19d4c81bf6b137435dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.4.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.4-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.2.4-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 843.4 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.2.4-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 cbccc4c769e58b6085dac7df3d4f5a6253eb87b4aa5c35a6ae7684a2175cebb6
MD5 8ff9b1de43b2501ceec250b940e52a02
BLAKE2b-256 8e75d020b6df04fb1ab65fef80f40e9c257a4d3a651f96dc6b7b13a2837c0e36

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.4-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.4-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.2.4-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d9bcd578839007d0d79259f2e916e5f04e31986ea5c5f80960dcb0e9a31a8a1b
MD5 2df1d14232395ca7084eb429aa2d1de7
BLAKE2b-256 39bd0d189d5c5d848d55a65854f089e2d78fc9622d5c89c09c472d748bf94bb1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.4-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.4-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.2.4-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2330cf76c578d8db0fd4cd4d4999bebefca23c92d0588e8e787931d3d0959101
MD5 52fcd1a8781f0d4f37810eafb9cbfdde
BLAKE2b-256 bf93192243410aa9bdba13a73b79fe7b46a8c6ae03f3a305f14b4ce076c210ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.4-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.4-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.2.4-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 844.8 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.2.4-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 1dd0651aad63cc3681e689e4c371e785af51c45196af20c42d188a7a9e8fc791
MD5 88cd1d0b8be21071e246e4646abb50ec
BLAKE2b-256 def223ee93976db577b174216f842230bd722d3d82f6ec57a8fa64a1de5c73fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.4-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.4-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.2.4-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 aef83e36ce8cadb56f5dbe8fd5e09ed23b8130074c0acd706eea9d57c2924850
MD5 da76f1bf7d831852d4e1628016d7d520
BLAKE2b-256 ff0aa36953c1ad84802b40adeff5a7193bca5d4326b7a9c68f8a314d57af4342

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.4-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.4-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.2.4-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 30d4b7873ee51d7e8118d5be87fe746b0ad7e2c64ad12548ddce687f536e383b
MD5 04e4f5d8aac2c2bc54fa999de1db1f61
BLAKE2b-256 c38f17f1a8d29df792239e19a12fceb4b67c8164fc254a33dccd1cd4da988058

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.4-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.4-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.2.4-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 844.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.2.4-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 365d0020a141ae76ebb9a35c8ec16b0ed37d82b5d297e333cb417fec5318ecb3
MD5 6a1ed2ae9579f9ca214043e390c99956
BLAKE2b-256 77b3128e8ea9488f453dea1d042eede89ab7c2f8f3f6453102a63568a45bf021

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.4-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.4-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.2.4-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 f7bbc1ea7c644abc4a949c26207fa67b98fff7452ac222e73ba060bf95c0222e
MD5 d7f43f32becc986a2acdd706ca382427
BLAKE2b-256 f9dc3a6a720a99bb0b16a63d88cedb04c300b4369901c83b486f06edc675fd75

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.4-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.2.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0a79f49190ce5fce3e47313a4c07aecfb62a94054f9681dc2a5a86fec30df857
MD5 26d8e6e308e1b24ce4e4b31b898c7b95
BLAKE2b-256 069d97f0acd01e35068f7f1a108c63e822de034354639734d2efa6f7bc5dd13e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.4-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page