Skip to main content

Super fast rust-powered RIM (raking) survey weighting with narwhals - supports polars and pandas

Project description

rimpy

Super fast rust-powered RIM (raking) survey weighting - supports both polars and pandas via Narhwlas.

Python 3.12+

Features

  • 🚀 Fast: Rust-powered engine with pure Python/NumPy fallback
  • 🔄 Backend agnostic: Works with both polars and pandas DataFrames via Narwhals
  • 📦 Lightweight: Only depends on narwhals and numpy
  • 🎯 Simple API: One function call to weight your data
  • Inspiration: Inspired by weightipy and check out their amazing work if you have more complex weighting needs

Installation

pip install rimpy

# Or with uv
uv add rimpy

# With optional dependencies
pip install rimpy[polars]  # For polars support
pip install rimpy[all]     # For both polars and pandas

Pre-built wheels are available for Linux, Windows, and macOS (arm64) on Python 3.12–3.14. The Rust engine is included automatically — no Rust toolchain needed.

Quick Start

import polars as pl
import rimpy

# Your survey data (works with pandas too!)
df = pl.DataFrame({
    "gender": [1, 1, 1, 2, 2],
    "age": [1, 2, 2, 1, 2],
})

# Define targets (percentages that should sum to 100)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 40, 2: 60},
}

# Apply weights - returns same type as input
weighted = rimpy.rake(df, targets)
print(weighted["weight"])

API Reference

rake(df, targets, **options)

Apply RIM weights to a DataFrame.

weighted = rimpy.rake(
    df,                          # polars or pandas DataFrame
    targets,                     # dict of target proportions
    max_iterations=1000,         # max iterations before stopping
    convergence_threshold=0.01,  # convergence criterion
    min_cap=None,                # minimum weight (optional)
    max_cap=None,                # maximum weight (optional)
    weight_column="weight",      # name for weight column
    drop_nulls=True,             # handle nulls (weight=1.0)
    total=None,                  # scale weighted sum to this value (optional)
    cap_correction=True,         # small epsilon on caps to prevent boundary oscillation
)

Controlled Total Base

Scale weights so the weighted sum equals a target population size:

# 500 respondents projected to a population of 50,000
weighted = rimpy.rake(df, targets, total=50_000)
weighted["weight"].sum()  # ≈ 50,000

Rows excluded from raking (e.g., nulls with drop_nulls=True) keep weight=1.0 and are not scaled.

rake_with_diagnostics(df, targets, **options)

Same as rake() but also returns diagnostics.

weighted, result = rimpy.rake_with_diagnostics(df, targets)

print(result.converged)      # True/False
print(result.iterations)     # Number of iterations
print(result.efficiency)     # Weighting efficiency (0-100%)
print(result.weight_min)     # Minimum weight
print(result.weight_max)     # Maximum weight
print(result.weight_ratio)   # Max/min ratio
print(result.summary())      # Dict of all stats

rake_by(df, targets, by, **options)

Apply weights separately within groups (same targets for all groups).

# Weight gender/age within each country
weighted = rimpy.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",  # or by=["country", "region"]
)

# With controlled total across all groups
weighted = rimpy.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",
    total=50_000,
)

rake_by_scheme(df, schemes, by, **options)

Apply different weighting schemes to different groups. Perfect for multi-country surveys!

# Each country can weight by DIFFERENT variables
country_schemes = {
    "US": {
        "gender": {1: 49, 2: 51},
        "age": {1: 20, 2: 30, 3: 30, 4: 20},
        "region": {1: 25, 2: 25, 3: 25, 4: 25},  # US weights by region
    },
    "UK": {
        "gender": {1: 49, 2: 51},
        "age": {1: 18, 2: 32, 3: 28, 4: 22},
        # UK doesn't weight by region or education
    },
    "DE": {
        "gender": {1: 48, 2: 52},
        "age": {1: 15, 2: 28, 3: 32, 4: 25},
        "education": {1: 30, 2: 40, 3: 30},  # Germany weights by education
    },
}

weighted = rimpy.rake_by_scheme(df, country_schemes, by="country")

# With diagnostics
weighted, result = rimpy.rake_by_scheme_with_diagnostics(df, country_schemes, by="country")
print(result.group_results["US"].efficiency)  # 90.0%
print(result.group_results["DE"].iterations)  # 15

Nested Weighting with group_totals

Weight within groups AND adjust group sizes to global targets:

# Weight age/gender within regions, then adjust region sizes
weighted = rimpy.rake_by_scheme(
    df,
    schemes={
        "North": {"age": {1: 15, 2: 85}, "gender": {1: 50, 2: 50}},
        "South": {"age": {1: 10, 2: 90}, "gender": {1: 48, 2: 52}},
    },
    by="region",
    group_totals={"North": 40, "South": 60},  # North=40%, South=60% of total
)

Combine with total to also control the absolute weighted base:

# Same proportions, but project to population of 10,000
weighted = rimpy.rake_by_scheme(
    df,
    schemes={...},
    by="region",
    group_totals={"North": 40, "South": 60},
    total=10_000,  # North≈4,000 + South≈6,000
)

The order of operations is: (1) rake within each group → (2) apply group_totals → (3) scale to total.

weight_summary(df, weight_col, by=None)

Summarize weight diagnostics, optionally by group.

# Overall summary
summary = rimpy.weight_summary(df, "weight")

# By country
summary = rimpy.weight_summary(df, "weight", by="country")

Returns DataFrame with:

Column Description
n Sample size
effective_n Effective sample size after weighting
efficiency_pct Weighting efficiency (0-100%)
weight_mean Mean weight (should be ~1.0)
weight_std Standard deviation of weights
weight_median Median weight
weight_min Minimum weight
weight_max Maximum weight
weight_ratio Ratio of max to min weight

validate_targets(df, targets)

Check targets for errors before weighting.

report = rimpy.validate_targets(df, targets)
print(report["errors"])    # Critical issues (will crash)
print(report["warnings"])  # Non-critical issues (informational)

validate_schemes(df, schemes, by)

Check schemes for errors before weighting with rake_by_scheme().

report = rimpy.validate_schemes(df, schemes, by="country")
print(report["_global"]["errors"])
print(report["US"]["warnings"])

Loading Schemes from Files

load_schemes(source, **options)

Load weighting schemes from a long-format table.

schemes = rimpy.load_schemes("targets.xlsx")
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

# Custom column names
schemes = rimpy.load_schemes(
    "targets.xlsx",
    key_col="country_id",
    var_col="variable",
    code_col="code",
    target_col="pct",
    sheet_name="Wave1",
)

Expected input format:

scheme_key target_var target_code target_pct
20230001 gender 1 49.85
20230001 gender 2 49.85
20230001 gender 3 0.3
20230001 smoker 1 21
20230001 smoker 2 79

load_schemes_wide(source, **options)

Load weighting schemes from a wide-format table.

schemes = rimpy.load_schemes_wide("targets.xlsx")
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

Expected input format:

target_var target_code 20230001 20240001 20230002
gender 1 49.85 49.9 49.9
gender 2 49.85 49.9 49.9
gender 3 0.3 0.2 0.2
smoker 1 21 9 10
smoker 2 79 91 90

Target Formats

rimpy accepts targets in two formats:

# Dict format (preferred)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 20, 2: 30, 3: 30, 4: 20},
}

# List format (weightipy-compatible)
targets = [
    {"gender": {1: 49, 2: 51}},
    {"age": {1: 20, 2: 30, 3: 30, 4: 20}},
]

Values can be proportions (0-1) or percentages (0-100). rimpy auto-detects.

Converting from weightipy

# weightipy format
weightipy_targets = {
    20230001: [
        {"gender": {1: 49.95, 2: 49.95, 3: 0.1}},
        {"age": {1: 32, 2: 37, 3: 31}},
    ],
}

# Convert to rimpy format
schemes = rimpy.convert_from_weightipy(weightipy_targets)
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

Performance

rimpy uses a Rust engine (via PyO3) for the core raking loop, with automatic fallback to pure Python/NumPy on unsupported platforms. Benchmark on synthetic survey data:

Scenario Python/NumPy Rust Speedup
Small survey (n=500, 3 vars) 0.25 ms 0.03 ms 8.7x
Medium survey (n=5,000, 5 vars) 1.48 ms 0.44 ms 3.3x
Large survey (n=50,000, 5 vars) 8.66 ms 5.83 ms 1.5x
25 countries × 5,000 each (parallel) 38.40 ms 5.72 ms 6.7x

The biggest gains come from grouped raking (multi-country surveys), where Rayon parallelizes across groups.

How It Works

rimpy implements iterative proportional fitting (IPF/raking):

  1. Preprocessing: Uses narwhals for backend-agnostic DataFrame operations
  2. Index caching: Pre-computes row indices for each target category
  3. Iteration: Rust engine with zero-allocation inner loop (falls back to NumPy if needed)
  4. Parallel groups: Multi-country raking runs across CPU cores via Rayon
  5. Output: Returns DataFrame in same format as input (polars in → polars out)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rimpy-0.1.4.tar.gz (41.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rimpy-0.1.4-cp314-cp314-win_amd64.whl (236.8 kB view details)

Uploaded CPython 3.14Windows x86-64

rimpy-0.1.4-cp314-cp314-manylinux_2_34_x86_64.whl (330.1 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

rimpy-0.1.4-cp314-cp314-macosx_11_0_arm64.whl (292.0 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

rimpy-0.1.4-cp313-cp313-win_amd64.whl (236.8 kB view details)

Uploaded CPython 3.13Windows x86-64

rimpy-0.1.4-cp313-cp313-manylinux_2_34_x86_64.whl (330.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

rimpy-0.1.4-cp313-cp313-macosx_11_0_arm64.whl (292.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

rimpy-0.1.4-cp312-cp312-win_amd64.whl (237.0 kB view details)

Uploaded CPython 3.12Windows x86-64

rimpy-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl (330.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

rimpy-0.1.4-cp312-cp312-macosx_11_0_arm64.whl (292.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file rimpy-0.1.4.tar.gz.

File metadata

  • Download URL: rimpy-0.1.4.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.4.tar.gz
Algorithm Hash digest
SHA256 9fe1b8759ccc19d18b1c0c0054339197feff7e45317f852ab32ecf04b2079aeb
MD5 5f36e72697af814e2910e663caf9466f
BLAKE2b-256 ee479e09c41b6cd6e08b4abbf10c136ce4521d7e50283231c548cf151e74e723

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.4.tar.gz:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.4-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.4-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 236.8 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.4-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 5a9e0bf4b3a0c4a1257a73247009d8e7992e7e74f9526f3df4449da294241eaa
MD5 1ccda42abdc47e0d0258f003048f5c2c
BLAKE2b-256 db9a53c9dcbe67d4117d2b36740b6f7c1f83bb4bdce77fe1b0fbba0d96b32007

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.4-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.4-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.4-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 45c0613088f543d22f1bb75101905d46a9bd199b5e9a318af6ee732aec4e8159
MD5 a288a991024e322249aaa86ab5f580da
BLAKE2b-256 7deb915b8555b69183a2db59aad1f786b282b34c98b383e2c56e558c0b5de021

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.4-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.4-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.4-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 63d532a017004af3642e2020cfcd5630ee6a4e0724b8f612176c113d735bac7d
MD5 9e2e7b8abcc13b4acd81f4e11c0434b7
BLAKE2b-256 b5a394f409a6af77728a9ec1d2a3e73d79b4425e0ab37c7146bc0e58b900bc80

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.4-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.4-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.4-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 236.8 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.4-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 aa1d701223ec926c0dcbe25598e9150a359950efbaf4fd86bf9a4bf2ae90e6e5
MD5 10c75a0cc3c0ccbcd321b36ecf0a4d0d
BLAKE2b-256 b8d9b558162a6b35506dc3e1d073936e044cfff4d03917f7f1ec735396a2df1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.4-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.4-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.4-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 c936f5aadf3be5869d34c1628d36b61c4146b6eaf6637f8e3f39bf627d5f5b61
MD5 aba289f7fcc252c81f70bfcc1c73f1f9
BLAKE2b-256 64915742f7e46d7da3dce8c253e24ba35321743783271729a4632820cfa53e44

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.4-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.4-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.4-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e849bc3c9ada65aeb286a9f058d297b2dc76b5c0441c7e3c2075661449f88f62
MD5 6ea8667f748b24a09dabcd2573670455
BLAKE2b-256 1f3952bfe34bc1bfade071ba32700b984aeb23c0f28be1de468b41db16aa678d

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.4-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.4-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.4-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 237.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.4-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 de3b969f08797dcbaa754c32f258051ce0eeb27d0faf8318644b06d270acdaaf
MD5 1ce2fadc3248719978c91cbdea632e64
BLAKE2b-256 748b4a7a20e51457c696dc04a4568bcc3c322e5fa0257f43bfd67d2f02891daa

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.4-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 cdd7766b7afee0bda5d7958adde9c4f1ace81a41cc6ec39c146bbc9e66406c7a
MD5 305de7e399d724b048d5c6062445f165
BLAKE2b-256 add24f7ff2cb6acdbca938435a432c633ffb4db2adb129d7a522411eb4464639

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.4-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e8add644a3c567fd05fb2b220c0b9fe439c07934647d409154e21f515e310c5a
MD5 25d1bc5792dff62e2f7b39a11e108b47
BLAKE2b-256 0fae167ff3b4ed34a77dd8355d074628d571853e04364c369376d6fa86945d07

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.4-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page