Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader — Arrow-native, zero C dependencies.

Features

  • Read .sav (bytecode) and .zsav (zlib) files
  • Arrow RecordBatch output — zero-copy to Polars, DataFusion, DuckDB
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels
  • Lazy reader via scan_sav() — returns Polars LazyFrame with projection and row limit pushdown
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer
  • One of the fastest SPSS readers — up to 2.5x faster than polars_readstat, 5–10x faster than pyreadstat
  • Python + Rust dual API from a single crate

Installation

Python:

pip install ambers

Rust:

cargo add ambers

Quick Start

Python

import ambers as am

# Eager read — data + metadata
df, meta = am.read_sav("survey.sav")

# Lazy read — returns Polars LazyFrame
lf, meta = am.scan_sav("survey.sav")
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_metadata("survey.sav")

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Average of 5 runs on Windows 11, Python 3.13, 24-core machine.

File Size Rows Cols ambers polars_readstat ambers vs prs pyreadstat pyreadstat mp (4w) ambers vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 0.002s 0.004s 2.0x faster 0.010s 0.493s 5.0x faster
test_2 (bytecode) 147 MB 22,070 677 0.812s 0.991s 1.2x faster 3.564s 1.781s 4.4x faster
test_3 (uncompressed) 1.1 GB 79,066 915 0.509s 1.279s 2.5x faster 4.849s 2.764s 9.5x faster
test_4 (uncompressed) 0.6 MB 201 158 0.002s 0.004s 2.0x faster 0.018s 0.470s 9.0x faster
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.004s 2.0x faster 0.015s 0.454s 7.5x faster
test_6 (uncompressed) 5.4 GB 395,330 916 2.801s 1.809s 1.5x slower 24.199s 11.718s 8.6x faster
  • vs polars_readstat: faster on 5 of 6 files — 1.2–2.5x faster (test_6 at 5.4 GB is 1.5x slower)
  • vs pyreadstat: 4–10x faster across all file sizes
  • vs pyreadstat multiprocess (4 workers): ambers single-threaded still faster on every file
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

pyreadstat multiprocess returns pandas; timing includes pl.from_pandas() conversion.

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.903s 0.363s (2.5x) 0.181s (5.0x) 0.157s (5.7x)
test_3 (1.1 GB, 79K × 915) 0.700s 0.554s (1.3x) 0.020s (35x) 0.012s (58x)
test_6 (5.4 GB, 395K × 916) 3.062s 2.343s (1.3x) 0.022s (139x) 0.013s (236x)

On the 5.4 GB file, selecting 5 columns and 1000 rows completes in 13ms — 236x faster than reading the full dataset.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.2.2.tar.gz (67.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.2.2-cp314-cp314-win_amd64.whl (769.4 kB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl (916.8 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.2.2-cp314-cp314-macosx_11_0_arm64.whl (838.2 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.2.2-cp313-cp313-win_amd64.whl (771.6 kB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl (917.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.2.2-cp313-cp313-macosx_11_0_arm64.whl (838.1 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.2.2-cp312-cp312-win_amd64.whl (772.0 kB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl (917.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.2.2-cp312-cp312-macosx_11_0_arm64.whl (838.6 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.2.2.tar.gz.

File metadata

  • Download URL: ambers-0.2.2.tar.gz
  • Upload date:
  • Size: 67.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.2.2.tar.gz
Algorithm Hash digest
SHA256 40a03f7d87aa016a8f20ff012dfaeeb5141b108d184dc0d2f94d2a202ec3cd00
MD5 a54bdf4001304976c75128a84dbba20c
BLAKE2b-256 13e68c1a046ce3b7a7c0076b3c55b980e69ecc8568a6efbe1adecdffcc6283be

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.2.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.2-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.2.2-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 769.4 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.2.2-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 ab04e1a4e7aa1378f12a3a928bb0dab0d91ead952eebad5beaee5abe629ee323
MD5 a6b1079d65c49b79b5d0b4405b4f9391
BLAKE2b-256 fa869523f281626159439c9f2ffc0184dcc73fe1540fcd5979a640fff7615804

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.2-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a75180ee8aea76357f0a4535ebaa324ee21f3185ef47fc8fc62d30fbb3f0e3ae
MD5 50655fa312928b8e187c3e91ab64f801
BLAKE2b-256 79ec1f380dc5235a19cd239f35eb1c3e4c4f4ba502fa1981fe53f09f503d814d

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.2-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.2.2-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8a10d67d03bd3af26dff9ae300a1802e011f2b9004b1d474a8434977e17cb257
MD5 9088b59d92f21eafe87806abf22cee3e
BLAKE2b-256 ffd049245397125a53d01af502b7eb98bda1ebdff49d1d3c0fa74445ca85de19

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.2-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.2-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.2.2-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 771.6 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.2.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 f9f80b334459a05f81b6fe9db59c8d71991527a333568fa52a6dec40596b23ec
MD5 65abcb6aee5b31c6a5c46a51d4e13abd
BLAKE2b-256 cb06f7c797fd8910c37ab020c9385b3fcd1b3add8588e26f441c2fd7ac9af54f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.2-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 44ed9426a92778db378ba03ff82834f329e94a3f5d425f66d0915dd87400cfc5
MD5 09c8645455fc1dfd0a1600b598e3b3c0
BLAKE2b-256 604ab30f48b2a6591c5792d3b9d8cf7d7fcb25e689c9b5b3fe191a905f910b24

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.2.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1e8fcd7bd3bb66a630c3d59c4e7d127d4d25c7e74776fc6c82e1698ae71bd1bf
MD5 06a2e707bcad64d6a3e868cfe6550362
BLAKE2b-256 756b7100ba671d7a8596c05b6355d73f336239eb7e816cac330db5e29de27b8d

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.2.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 772.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.2.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 6bfa000e3fb069692d8bee8168cf321126c76e48fb44a726c323ea7f5650c23b
MD5 a6ed377525c01570cede8cf7d5ff5788
BLAKE2b-256 2a5b5643eedac267c8ac4e8de0f7501693e7c670ffed9b1bdc506d129e9e89f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.2-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1b24b86e707e1e8221ee3428f13674ff9a781d53065f6053e2c15ce7b93013b2
MD5 774b919b37fa080d5caadfc9ecc6f744
BLAKE2b-256 240979a873a5adc02004e05a7105b48122c9d2c67e188b2d9b85ab5ef1e8369b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.2.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.2.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7b1e05d1eff3adf3503a5d764981ec588596a79e01bdf2b0bb072df7c8ba0d1d
MD5 9f9ff8c5dd7b2b4cc4a494b216f61f24
BLAKE2b-256 958878c7c215a2d64470f1e4fa8430bde679d7b599ad155d1cb9c72c828db7e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.2.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page