Skip to main content

Super fast rust-powered RIM (raking) survey weighting with narwhals - supports polars and pandas

Project description

rimpy

rimpy banner

Super fast rust-powered RIM (raking) survey weighting - supports both polars and pandas via Narwhals.

PyPI License: MIT Python 3.12+ Rust

Features

  • 🚀 Fast: Rust-powered Arrow engine with zero Python objects in the data path
  • 🔄 Backend agnostic: Works with both polars and pandas DataFrames via Narwhals
  • 📦 Lightweight: Only depends on narwhals (+ pyarrow for pandas users)
  • 🎯 Simple API: One function call to weight your data
  • Inspiration: Inspired by weightipy and check out their amazing work if you have more complex weighting needs

Installation

pip install rimpy

# Or with uv
uv add rimpy

# With optional dependencies
pip install rimpy[polars]  # For polars support
pip install rimpy[all]     # For both polars and pandas

Pre-built wheels are available for Linux, Windows, and macOS (arm64) on Python 3.12–3.14. The Rust engine is included automatically — no Rust toolchain needed.

Quick Start

import polars as pl
import rimpy as rim

# Your survey data (works with pandas too!)
df = pl.DataFrame({
    "gender": [1, 1, 1, 2, 2],
    "age": [1, 2, 2, 1, 2],
})

# Define targets (percentages that should sum to 100)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 40, 2: 60},
}

# Apply weights - returns same type as input
weighted = rim.rake(df, targets)
print(weighted["weight"])

Architecture

rimpy uses a three-layer Rust design:

Python API  →  Narwhals (backend-agnostic DataFrames)
                  │
                  ▼  Arrow PyCapsule
              Binding Layer (PyO3)
                  │
                  ▼
              Arrow Middleware (language-agnostic)
                  │
                  ▼
              RIM Engine (pure Rust)

The bottom two layers have zero Python dependencies — they can be reused by R, Julia, or any language with Arrow FFI support.

How It Works

df (polars/pandas) → narwhals → Arrow → RIM engine → Arrow → narwhals → df with weights

Performance

Benchmark on synthetic survey data (polars backend), zero Python objects in the hot path:

Scenario Time
Small survey (n=1,000, 3 vars) 0.17 ms
Medium survey (n=10,000, 3 vars) 0.67 ms
Large survey (n=100,000, 3 vars) 10.60 ms
Very large survey (n=1,000,000, 3 vars) 126.14 ms
Grouped raking (n=100,000, 10 groups) 14.34 ms

Grouped raking uses Rayon to parallelize across groups.

API Reference

rake(df, targets, **options)

Apply RIM weights to a DataFrame.

weighted = rim.rake(
    df,                          # polars or pandas DataFrame
    targets,                     # dict of target proportions
    max_iterations=1000,         # max iterations before stopping
    convergence_threshold=0.01,  # convergence criterion
    min_cap=None,                # minimum weight (optional)
    max_cap=None,                # maximum weight (optional)
    weight_column="weight",      # name for weight column
    drop_nulls=True,             # handle nulls (weight=1.0)
    total=None,                  # scale weighted sum to this value (optional)
    cap_correction=True,         # small epsilon on caps to prevent boundary oscillation
)

Controlled Total Base

Scale weights so the weighted sum equals a target population size:

# 500 respondents projected to a population of 50,000
weighted = rim.rake(df, targets, total=50_000)
weighted["weight"].sum()  # ≈ 50,000

Rows excluded from raking (e.g., nulls with drop_nulls=True) keep weight=1.0 and are not scaled.

rake_with_diagnostics(df, targets, **options)

Same as rake() but also returns diagnostics.

weighted, result = rim.rake_with_diagnostics(df, targets)

print(result.converged)      # True/False
print(result.iterations)     # Number of iterations
print(result.efficiency)     # Weighting efficiency (0-100%)
print(result.weight_min)     # Minimum weight
print(result.weight_max)     # Maximum weight
print(result.weight_ratio)   # Max/min ratio
print(result.summary())      # Dict of all stats

rake_by(df, targets, by, **options)

Apply weights separately within groups (same targets for all groups).

# Weight gender/age within each country
weighted = rim.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",  # or by=["country", "region"]
)

# With controlled total across all groups
weighted = rim.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",
    total=50_000,
)

rake_by_scheme(df, schemes, by, **options)

Apply different weighting schemes to different groups. Perfect for multi-country surveys!

# Each country can weight by DIFFERENT variables
country_schemes = {
    "US": {
        "gender": {1: 49, 2: 51},
        "age": {1: 20, 2: 30, 3: 30, 4: 20},
        "region": {1: 25, 2: 25, 3: 25, 4: 25},  # US weights by region
    },
    "UK": {
        "gender": {1: 49, 2: 51},
        "age": {1: 18, 2: 32, 3: 28, 4: 22},
        # UK doesn't weight by region or education
    },
    "DE": {
        "gender": {1: 48, 2: 52},
        "age": {1: 15, 2: 28, 3: 32, 4: 25},
        "education": {1: 30, 2: 40, 3: 30},  # Germany weights by education
    },
}

weighted = rim.rake_by_scheme(df, country_schemes, by="country")

# With diagnostics
weighted, result = rim.rake_by_scheme_with_diagnostics(df, country_schemes, by="country")
print(result.group_results["US"].efficiency)  # 90.0%
print(result.group_results["DE"].iterations)  # 15

Nested Weighting with group_totals

Weight within groups AND adjust group sizes to global targets:

# Weight age/gender within regions, then adjust region sizes
weighted = rim.rake_by_scheme(
    df,
    schemes={
        "North": {"age": {1: 15, 2: 85}, "gender": {1: 50, 2: 50}},
        "South": {"age": {1: 10, 2: 90}, "gender": {1: 48, 2: 52}},
    },
    by="region",
    group_totals={"North": 40, "South": 60},  # North=40%, South=60% of total
)

Combine with total to also control the absolute weighted base:

# Same proportions, but project to population of 10,000
weighted = rim.rake_by_scheme(
    df,
    schemes={...},
    by="region",
    group_totals={"North": 40, "South": 60},
    total=10_000,  # North≈4,000 + South≈6,000
)

The order of operations is: (1) rake within each group → (2) apply group_totals → (3) scale to total.

weight_summary(df, weight_col, by=None)

Summarize weight diagnostics, optionally by group.

# Overall summary
summary = rim.weight_summary(df, "weight")

# By country
summary = rim.weight_summary(df, "weight", by="country")

Returns DataFrame with:

Column Description
n Sample size
effective_n Effective sample size after weighting
efficiency_pct Weighting efficiency (0-100%)
weight_mean Mean weight (should be ~1.0)
weight_std Standard deviation of weights
weight_median Median weight
weight_min Minimum weight
weight_max Maximum weight
weight_ratio Ratio of max to min weight

validate_targets(df, targets)

Check targets for errors before weighting.

report = rim.validate_targets(df, targets)
print(report["errors"])    # Critical issues (will crash)
print(report["warnings"])  # Non-critical issues (informational)

validate_schemes(df, schemes, by)

Check schemes for errors before weighting with rake_by_scheme().

report = rim.validate_schemes(df, schemes, by="country")
print(report["_global"]["errors"])
print(report["US"]["warnings"])

Loading Schemes from Files

load_schemes(source, **options)

Load weighting schemes from a long-format table.

schemes = rim.load_schemes("targets.xlsx")
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

# Custom column names
schemes = rim.load_schemes(
    "targets.xlsx",
    key_col="country_id",
    var_col="variable",
    code_col="code",
    target_col="pct",
    sheet_name="Wave1",
)

Expected input format:

scheme_key target_var target_code target_pct
20230001 gender 1 49.85
20230001 gender 2 49.85
20230001 gender 3 0.3
20230001 smoker 1 21
20230001 smoker 2 79

load_schemes_wide(source, **options)

Load weighting schemes from a wide-format table.

schemes = rim.load_schemes_wide("targets.xlsx")
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

Expected input format:

target_var target_code 20230001 20240001 20230002
gender 1 49.85 49.9 49.9
gender 2 49.85 49.9 49.9
gender 3 0.3 0.2 0.2
smoker 1 21 9 10
smoker 2 79 91 90

Target Formats

rimpy accepts targets in two formats:

# Dict format (preferred)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 20, 2: 30, 3: 30, 4: 20},
}

# List format (weightipy-compatible)
targets = [
    {"gender": {1: 49, 2: 51}},
    {"age": {1: 20, 2: 30, 3: 30, 4: 20}},
]

Values can be proportions (0-1) or percentages (0-100). rimpy auto-detects.

Converting from weightipy

# weightipy format
weightipy_targets = {
    20230001: [
        {"gender": {1: 49.95, 2: 49.95, 3: 0.1}},
        {"age": {1: 32, 2: 37, 3: 31}},
    ],
}

# Convert to rimpy format
schemes = rim.convert_from_weightipy(weightipy_targets)
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

Special edge cases

RIM weighting has a handful of edge cases where rimpy's behavior is non-obvious or diverges from professional weighting tools like Q, SPSS, and weightipy. Notable examples include:

  • target = 0 on a category that has respondents — the literal interpretation ("weighted % must be 0") is ambiguous in the algorithm and tools disagree on how to handle it. rimpy's default refuses with an actionable error; opt-in modes (hard_zero, near_zero) cover the Q / SPSS / weightipy conventions.
  • Empty target categories — when a non-zero target is supplied for a category code that doesn't exist in the data. rimpy emits a UserWarning.

See edge_cases.md for the full treatment of each case, the empirical comparison against Q's R-engine output, and recommendations on which mode to use in production parallel-validation workflows.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rimpy-0.3.0.tar.gz (64.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rimpy-0.3.0-cp314-cp314-win_amd64.whl (846.9 kB view details)

Uploaded CPython 3.14Windows x86-64

rimpy-0.3.0-cp314-cp314-manylinux_2_34_x86_64.whl (888.3 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

rimpy-0.3.0-cp314-cp314-macosx_11_0_arm64.whl (759.9 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

rimpy-0.3.0-cp313-cp313-win_amd64.whl (824.0 kB view details)

Uploaded CPython 3.13Windows x86-64

rimpy-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl (891.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

rimpy-0.3.0-cp313-cp313-macosx_11_0_arm64.whl (759.8 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

rimpy-0.3.0-cp312-cp312-win_amd64.whl (823.6 kB view details)

Uploaded CPython 3.12Windows x86-64

rimpy-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl (891.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

rimpy-0.3.0-cp312-cp312-macosx_11_0_arm64.whl (759.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file rimpy-0.3.0.tar.gz.

File metadata

  • Download URL: rimpy-0.3.0.tar.gz
  • Upload date:
  • Size: 64.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rimpy-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5e5b421ca5c5ddd789a367f765cd41f47cdb8108e6b3bf15a487a28e682f7822
MD5 a4f3749272b43159202f76b9cf6635b1
BLAKE2b-256 bf8ae83c8c980e82f9ffde5710de7c58bf494ee0d669f6098b2061bd57696891

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.3.0.tar.gz:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.3.0-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.3.0-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 846.9 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rimpy-0.3.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 f292ac41f7dd64b6b838753d9ec5b2ddca90373ba16f5877e7b05516ea12f0fc
MD5 1b819481d066c5d576028fd83d40ceda
BLAKE2b-256 87052eeff253d6b8b4d4c2d856900e4c6d9fdb89c91d16025acf533e57672be3

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.3.0-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.3.0-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.3.0-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e4ecd9a402f64cd435d4dc66ebec830902626b6279a3bd72469719ec722eb0e0
MD5 0bfab9b1e187d901cc3b3e51e0f8d753
BLAKE2b-256 87c8efa7ca380f7648a2faab20587241dabcc01110d8825b385e641216e8d5cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.3.0-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.3.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.3.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 925390a36d3bc15b2bd90e844c6886757eca17d06180053e6dec2c09630cea72
MD5 59364156f0aeb263263f23f3f4402e17
BLAKE2b-256 01ef539af7af9d4874c1660e50d91f3aae822a709fca96f96db8130166a0f8b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.3.0-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.3.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.3.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 824.0 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rimpy-0.3.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2bdaedd8d22f97addf384971b9c36faaffddf6d0cd7054608092b30dba2a70db
MD5 24aa20fd66e22a71c5e13a047603cf3d
BLAKE2b-256 29e8dd4da61b2c0b897122a1c6c65c1e26219765554d8f4e0d5b264a02ec5ba6

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.3.0-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 fcd570116f064a635e1aa0930c9ff3fa76c296580548be924dd65cfa207ec045
MD5 51220c3bf085beea0b370f0753847d8c
BLAKE2b-256 390aad5835114225249a1d9185361660472f16677d145e6cbf3ec1c88e9bc4df

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.3.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.3.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9217c4c51021c7c25ff6823685f623aee4e4039f7d2c9256e964fede57184170
MD5 6417051cbafd1a82f5e9af1b0e0d6f72
BLAKE2b-256 32a266399cb93f8aa46aee425edef43da104441404ec15d88274b194fba2133a

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.3.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.3.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.3.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 823.6 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for rimpy-0.3.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 bee6acda902db0ec51803ae153e538987c5b85ffea2e716cbf2f0dc84dc1b49f
MD5 7aa1ca70acee0ddecf25ab4cfc4d6687
BLAKE2b-256 d34676b77ec2090f214d640e2b1cca742ae9121d4f52ad82fda95332518712a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.3.0-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ea35f142bb9d09183edc1ba7cce8f0ff3eac6191c82debed40af478ed3b1d3be
MD5 66fb4dc17c60881a70a29ee309fbd229
BLAKE2b-256 c980e47f3a596f0f9580676a48b854a71c9cc3823c72677a0086ba0c19030a07

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.3.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.3.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2b42155b02cbfe8eabaf07fa822386c2ab6843272bb74cc73aa261873a1cb00e
MD5 72612150edf282a225a0a0f18c8d4335
BLAKE2b-256 6ac7f876da269e4e83a5972900a00c62bb34b9dfc7680821f63cae161a6c65ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.3.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page