Skip to main content

Pure Rust SPSS .sav/.zsav reader with Polars DataFrame output

Project description

ambers

ambers banner

Crates.io PyPI License: MIT

Pure Rust SPSS .sav/.zsav reader — Arrow-native, zero C dependencies.

Features

  • Read .sav (bytecode) and .zsav (zlib) files
  • Arrow RecordBatch output — zero-copy to Polars, DataFusion, DuckDB
  • Rich metadata: variable labels, value labels, missing values, MR sets, measure levels
  • Lazy reader via scan_sav() — returns Polars LazyFrame with projection and row limit pushdown
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer
  • Fastest SPSS reader — faster than polars_readstat and 2–175x faster than pyreadstat
  • Python + Rust dual API from a single crate

Installation

Python:

pip install ambers

Rust:

cargo add ambers

Quick Start

Python

import ambers as am

# Eager read — data + metadata
df, meta = am.read_sav("survey.sav")

# Lazy read — returns Polars LazyFrame
lf, meta = am.scan_sav("survey.sav")
df = lf.select(["Q1", "Q2", "age"]).head(1000).collect()

# Explore metadata
meta.summary()
meta.describe("Q1")
meta.value("Q1")

# Read metadata only (fast, skips data)
meta = am.read_sav_metadata("survey.sav")

Rust

use ambers::{read_sav, read_sav_metadata};

// Read data + metadata
let (batch, meta) = read_sav("survey.sav")?;
println!("{} rows, {} cols", batch.num_rows(), meta.number_columns);

// Read metadata only
let meta = read_sav_metadata("survey.sav")?;
println!("{}", meta.label("Q1").unwrap_or("(no label)"));

Metadata API (Python)

Method Description
meta.summary() Formatted overview: file info, type distribution, annotations
meta.describe("Q1") Deep-dive into a single variable (or list of variables)
meta.diff(other) Compare two metadata objects, returns MetaDiff
meta.label("Q1") Variable label
meta.value("Q1") Value labels dict
meta.format("Q1") SPSS format string (e.g. "F8.2", "A50")
meta.measure("Q1") Measurement level ("nominal", "ordinal", "scale")
meta.schema Full metadata as a nested Python dict

All variable-name methods raise KeyError for unknown variables.

Streaming Reader (Rust)

let mut scanner = ambers::scan_sav("survey.sav")?;
scanner.select(&["age", "gender"])?;
scanner.limit(1000);

while let Some(batch) = scanner.next_batch()? {
    println!("Batch: {} rows", batch.num_rows());
}

Performance

Eager Read

All results return a Polars DataFrame. Average of 5 runs on Windows 11, Python 3.13, 24-core machine.

File Size Rows Cols ambers polars_readstat pyreadstat pyreadstat mp (4w) ambers vs polars_readstat ambers vs pyreadstat
test_1 (bytecode) 0.2 MB 1,500 75 0.002s 0.004s 0.328s 0.504s 2.0x faster 175x faster
test_2 (bytecode) 147 MB 22,070 677 0.880s 0.949s 3.618s 1.772s 1.1x faster 4.1x faster
test_3 (uncompressed) 1.1 GB 79,066 915 1.094s 1.359s 5.002s 2.740s 1.2x faster 4.6x faster
test_4 (uncompressed) 0.6 MB 201 158 0.013s 0.015s 0.022s 0.519s 1.1x faster 1.7x faster
test_5 (uncompressed) 0.6 MB 203 136 0.002s 0.004s 0.016s 0.477s 1.9x faster 8.2x faster
  • vs polars_readstat: faster on every file — 1.1–2.0x faster
  • vs pyreadstat: 2–175x faster across all file sizes
  • vs pyreadstat multiprocess (4 workers): ambers single-threaded still faster on every file
  • No PyArrow dependency — uses Arrow PyCapsule Interface for zero-copy transfer

pyreadstat multiprocess returns pandas; timing includes pl.from_pandas() conversion.

Lazy Read with Pushdown

scan_sav() returns a Polars LazyFrame. Unlike eager reads, it only reads the data you ask for:

File (size) Full collect Select 5 cols Head 1000 rows Select 5 + head 1000
test_2 (147 MB, 22K × 677) 0.833s 0.310s (2.7x) 0.106s (7.8x) 0.084s (9.9x)
test_3 (1.1 GB, 79K × 915) 1.036s 0.234s (4.4x) 0.019s (55.7x) 0.006s (167x)

On the 1.1 GB file, selecting 5 columns and 1000 rows completes in 6ms — 167x faster than reading the full dataset.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ambers-0.1.7.tar.gz (67.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ambers-0.1.7-cp314-cp314-win_amd64.whl (753.6 kB view details)

Uploaded CPython 3.14Windows x86-64

ambers-0.1.7-cp314-cp314-manylinux_2_34_x86_64.whl (899.9 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

ambers-0.1.7-cp314-cp314-macosx_11_0_arm64.whl (821.1 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ambers-0.1.7-cp313-cp313-win_amd64.whl (755.4 kB view details)

Uploaded CPython 3.13Windows x86-64

ambers-0.1.7-cp313-cp313-manylinux_2_34_x86_64.whl (900.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

ambers-0.1.7-cp313-cp313-macosx_11_0_arm64.whl (820.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ambers-0.1.7-cp312-cp312-win_amd64.whl (755.8 kB view details)

Uploaded CPython 3.12Windows x86-64

ambers-0.1.7-cp312-cp312-manylinux_2_34_x86_64.whl (900.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

ambers-0.1.7-cp312-cp312-macosx_11_0_arm64.whl (820.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file ambers-0.1.7.tar.gz.

File metadata

  • Download URL: ambers-0.1.7.tar.gz
  • Upload date:
  • Size: 67.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.1.7.tar.gz
Algorithm Hash digest
SHA256 cac95c3b2080f60921dfa2578f637b3b779d8ff9f39dcb172cb15dc212459d1f
MD5 6557f7cad14ed57009e075fb5123276f
BLAKE2b-256 69fd8b691799455e0099477c74bdfe6d2c0cc9ed673ddf47dd11b0725de4c032

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.1.7.tar.gz:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.1.7-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: ambers-0.1.7-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 753.6 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.1.7-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 ef5adbda14f476c039932c7ab501821dc2cc2fcf27d2fdb04104f86db838622f
MD5 cbbeb9a81c917d617efcb41123891ee8
BLAKE2b-256 2fa7777afb4277f70d44b8005cf4ced267a84651444288c0a9a926ca75a2179c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.1.7-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.1.7-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.1.7-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e0a5b3445afd87a60f2c2199505aaa9cf0364a7b80221a651687cdafc673b374
MD5 87cb6f35f731e91dc9a93f043afe052e
BLAKE2b-256 3fe5ec139cd7c4014abdd714d8810e34e0fe9ee730619277f059bca98ebd1206

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.1.7-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.1.7-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.1.7-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 06140a36a1be5f81bf52cd96687b2b9a94347bc18aebe26463e8369c8f7e9429
MD5 8b67d5f743d5c58287f7a75ad9d69377
BLAKE2b-256 203fbb4958844ad0d99e1093b1025cbad0bfa293c99e4ddf23540f3f0690c9bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.1.7-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.1.7-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: ambers-0.1.7-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 755.4 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.1.7-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 dda2382b31838e4a5d1cbff7f9c50ac05701183f8b13aad35aec69e70411fa17
MD5 289fec58850866c2ed97d80bbce5644a
BLAKE2b-256 5fddf3df4ee2d45bd7d2ed8cfb11a968a2e634218b06a8e7fe2c714e6b14fd55

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.1.7-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.1.7-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.1.7-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b2a04c02fe88ce2834682c538ed609d7800df993f38f1ce758a5f9c7e141048b
MD5 3f53a52dbc4a63d6041a043e06ed492e
BLAKE2b-256 83e34d9e33a1e0c724f0816e4ad73d1688f98ed5f0e7fea37418331176d90fb2

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.1.7-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.1.7-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.1.7-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b8dd9a563cddcaacc381fc4b22118c996c968ed31e421fb39f254ae12a72213a
MD5 dea9216f358681c2a765ae76a10a71cf
BLAKE2b-256 dc18c7fcde284706686904be31d489911806452a239203e513498bc2bb7acca2

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.1.7-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.1.7-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: ambers-0.1.7-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 755.8 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ambers-0.1.7-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 585c4d8ed3eee0c46ef6e991b26396f6216540e21f4862a031a8063ee55257f5
MD5 f333755e1f1297f054f58ba1952a0d12
BLAKE2b-256 5380e592de46a8061e0d15e36a6b4d6b74c783f3dbc95c218c1157d7b522ffc9

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.1.7-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.1.7-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for ambers-0.1.7-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1af28ec23f4c7470fecc4c2d1adc2e767c38d71c698b4c6c1d6a91efb0fedfef
MD5 3d201f1671da94be9e1ac5012f6e891d
BLAKE2b-256 f287e622e4ca75e6746f4c5d491688590351c27289d8aea55d64e8c0dc40dce1

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.1.7-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ambers-0.1.7-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ambers-0.1.7-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ecf1f105c0fe858889ead1da31225110f32b8b10318f4f21d9345703f15a6584
MD5 0d5eeb61d9d86cc71620b47024bfd925
BLAKE2b-256 b503f92f60efb5df8f0d304c281189bb56367f4cdf50ef030d012c136e145e79

See more details on using hashes here.

Provenance

The following attestation bundles were made for ambers-0.1.7-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/ambers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page