Skip to main content

Super fast rust-powered RIM (raking) survey weighting with narwhals - supports polars and pandas

Project description

rimpy

Super fast rust-powered RIM (raking) survey weighting - supports both polars and pandas via Narhwlas.

Python 3.12+

Features

  • 🚀 Fast: Rust-powered engine with pure Python/NumPy fallback
  • 🔄 Backend agnostic: Works with both polars and pandas DataFrames via Narwhals
  • 📦 Lightweight: Only depends on narwhals and numpy
  • 🎯 Simple API: One function call to weight your data
  • Inspiration: Inspired by weightipy and check out their amazing work if you have more complex weighting needs

Installation

pip install rimpy

# Or with uv
uv add rimpy

# With optional dependencies
pip install rimpy[polars]  # For polars support
pip install rimpy[all]     # For both polars and pandas

Pre-built wheels are available for Linux, Windows, and macOS (arm64) on Python 3.12–3.14. The Rust engine is included automatically — no Rust toolchain needed.

Quick Start

import polars as pl
import rimpy as rim

# Your survey data (works with pandas too!)
df = pl.DataFrame({
    "gender": [1, 1, 1, 2, 2],
    "age": [1, 2, 2, 1, 2],
})

# Define targets (percentages that should sum to 100)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 40, 2: 60},
}

# Apply weights - returns same type as input
weighted = rim.rake(df, targets)
print(weighted["weight"])

API Reference

rake(df, targets, **options)

Apply RIM weights to a DataFrame.

weighted = rim.rake(
    df,                          # polars or pandas DataFrame
    targets,                     # dict of target proportions
    max_iterations=1000,         # max iterations before stopping
    convergence_threshold=0.01,  # convergence criterion
    min_cap=None,                # minimum weight (optional)
    max_cap=None,                # maximum weight (optional)
    weight_column="weight",      # name for weight column
    drop_nulls=True,             # handle nulls (weight=1.0)
    total=None,                  # scale weighted sum to this value (optional)
    cap_correction=True,         # small epsilon on caps to prevent boundary oscillation
)

Controlled Total Base

Scale weights so the weighted sum equals a target population size:

# 500 respondents projected to a population of 50,000
weighted = rim.rake(df, targets, total=50_000)
weighted["weight"].sum()  # ≈ 50,000

Rows excluded from raking (e.g., nulls with drop_nulls=True) keep weight=1.0 and are not scaled.

rake_with_diagnostics(df, targets, **options)

Same as rake() but also returns diagnostics.

weighted, result = rim.rake_with_diagnostics(df, targets)

print(result.converged)      # True/False
print(result.iterations)     # Number of iterations
print(result.efficiency)     # Weighting efficiency (0-100%)
print(result.weight_min)     # Minimum weight
print(result.weight_max)     # Maximum weight
print(result.weight_ratio)   # Max/min ratio
print(result.summary())      # Dict of all stats

rake_by(df, targets, by, **options)

Apply weights separately within groups (same targets for all groups).

# Weight gender/age within each country
weighted = rim.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",  # or by=["country", "region"]
)

# With controlled total across all groups
weighted = rim.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",
    total=50_000,
)

rake_by_scheme(df, schemes, by, **options)

Apply different weighting schemes to different groups. Perfect for multi-country surveys!

# Each country can weight by DIFFERENT variables
country_schemes = {
    "US": {
        "gender": {1: 49, 2: 51},
        "age": {1: 20, 2: 30, 3: 30, 4: 20},
        "region": {1: 25, 2: 25, 3: 25, 4: 25},  # US weights by region
    },
    "UK": {
        "gender": {1: 49, 2: 51},
        "age": {1: 18, 2: 32, 3: 28, 4: 22},
        # UK doesn't weight by region or education
    },
    "DE": {
        "gender": {1: 48, 2: 52},
        "age": {1: 15, 2: 28, 3: 32, 4: 25},
        "education": {1: 30, 2: 40, 3: 30},  # Germany weights by education
    },
}

weighted = rim.rake_by_scheme(df, country_schemes, by="country")

# With diagnostics
weighted, result = rim.rake_by_scheme_with_diagnostics(df, country_schemes, by="country")
print(result.group_results["US"].efficiency)  # 90.0%
print(result.group_results["DE"].iterations)  # 15

Nested Weighting with group_totals

Weight within groups AND adjust group sizes to global targets:

# Weight age/gender within regions, then adjust region sizes
weighted = rim.rake_by_scheme(
    df,
    schemes={
        "North": {"age": {1: 15, 2: 85}, "gender": {1: 50, 2: 50}},
        "South": {"age": {1: 10, 2: 90}, "gender": {1: 48, 2: 52}},
    },
    by="region",
    group_totals={"North": 40, "South": 60},  # North=40%, South=60% of total
)

Combine with total to also control the absolute weighted base:

# Same proportions, but project to population of 10,000
weighted = rim.rake_by_scheme(
    df,
    schemes={...},
    by="region",
    group_totals={"North": 40, "South": 60},
    total=10_000,  # North≈4,000 + South≈6,000
)

The order of operations is: (1) rake within each group → (2) apply group_totals → (3) scale to total.

weight_summary(df, weight_col, by=None)

Summarize weight diagnostics, optionally by group.

# Overall summary
summary = rim.weight_summary(df, "weight")

# By country
summary = rim.weight_summary(df, "weight", by="country")

Returns DataFrame with:

Column Description
n Sample size
effective_n Effective sample size after weighting
efficiency_pct Weighting efficiency (0-100%)
weight_mean Mean weight (should be ~1.0)
weight_std Standard deviation of weights
weight_median Median weight
weight_min Minimum weight
weight_max Maximum weight
weight_ratio Ratio of max to min weight

validate_targets(df, targets)

Check targets for errors before weighting.

report = rim.validate_targets(df, targets)
print(report["errors"])    # Critical issues (will crash)
print(report["warnings"])  # Non-critical issues (informational)

validate_schemes(df, schemes, by)

Check schemes for errors before weighting with rake_by_scheme().

report = rim.validate_schemes(df, schemes, by="country")
print(report["_global"]["errors"])
print(report["US"]["warnings"])

Loading Schemes from Files

load_schemes(source, **options)

Load weighting schemes from a long-format table.

schemes = rim.load_schemes("targets.xlsx")
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

# Custom column names
schemes = rim.load_schemes(
    "targets.xlsx",
    key_col="country_id",
    var_col="variable",
    code_col="code",
    target_col="pct",
    sheet_name="Wave1",
)

Expected input format:

scheme_key target_var target_code target_pct
20230001 gender 1 49.85
20230001 gender 2 49.85
20230001 gender 3 0.3
20230001 smoker 1 21
20230001 smoker 2 79

load_schemes_wide(source, **options)

Load weighting schemes from a wide-format table.

schemes = rim.load_schemes_wide("targets.xlsx")
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

Expected input format:

target_var target_code 20230001 20240001 20230002
gender 1 49.85 49.9 49.9
gender 2 49.85 49.9 49.9
gender 3 0.3 0.2 0.2
smoker 1 21 9 10
smoker 2 79 91 90

Target Formats

rimpy accepts targets in two formats:

# Dict format (preferred)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 20, 2: 30, 3: 30, 4: 20},
}

# List format (weightipy-compatible)
targets = [
    {"gender": {1: 49, 2: 51}},
    {"age": {1: 20, 2: 30, 3: 30, 4: 20}},
]

Values can be proportions (0-1) or percentages (0-100). rimpy auto-detects.

Converting from weightipy

# weightipy format
weightipy_targets = {
    20230001: [
        {"gender": {1: 49.95, 2: 49.95, 3: 0.1}},
        {"age": {1: 32, 2: 37, 3: 31}},
    ],
}

# Convert to rimpy format
schemes = rim.convert_from_weightipy(weightipy_targets)
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

Performance

rimpy uses a Rust engine (via PyO3) for the core raking loop, with automatic fallback to pure Python/NumPy on unsupported platforms. Benchmark on synthetic survey data:

Scenario Python/NumPy Rust Speedup
Small survey (n=500, 3 vars) 0.25 ms 0.03 ms 8.7x
Medium survey (n=5,000, 5 vars) 1.48 ms 0.44 ms 3.3x
Large survey (n=50,000, 5 vars) 8.66 ms 5.83 ms 1.5x
25 countries × 5,000 each (parallel) 38.40 ms 5.72 ms 6.7x

The biggest gains come from grouped raking (multi-country surveys), where Rayon parallelizes across groups.

How It Works

rimpy implements iterative proportional fitting (IPF/raking):

  1. Preprocessing: Uses narwhals for backend-agnostic DataFrame operations
  2. Index caching: Pre-computes row indices for each target category
  3. Iteration: Rust engine with zero-allocation inner loop (falls back to NumPy if needed)
  4. Parallel groups: Multi-country raking runs across CPU cores via Rayon
  5. Output: Returns DataFrame in same format as input (polars in → polars out)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rimpy-0.1.5.tar.gz (28.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rimpy-0.1.5-cp314-cp314-win_amd64.whl (237.7 kB view details)

Uploaded CPython 3.14Windows x86-64

rimpy-0.1.5-cp314-cp314-manylinux_2_34_x86_64.whl (331.1 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

rimpy-0.1.5-cp314-cp314-macosx_11_0_arm64.whl (293.0 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

rimpy-0.1.5-cp313-cp313-win_amd64.whl (237.7 kB view details)

Uploaded CPython 3.13Windows x86-64

rimpy-0.1.5-cp313-cp313-manylinux_2_34_x86_64.whl (331.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

rimpy-0.1.5-cp313-cp313-macosx_11_0_arm64.whl (293.1 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

rimpy-0.1.5-cp312-cp312-win_amd64.whl (237.9 kB view details)

Uploaded CPython 3.12Windows x86-64

rimpy-0.1.5-cp312-cp312-manylinux_2_34_x86_64.whl (331.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

rimpy-0.1.5-cp312-cp312-macosx_11_0_arm64.whl (293.5 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file rimpy-0.1.5.tar.gz.

File metadata

  • Download URL: rimpy-0.1.5.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.5.tar.gz
Algorithm Hash digest
SHA256 0df0e3fada5527d315a8e9dffed6922385404ee3de75d0b1f034dbc8961fb5d3
MD5 2a14f004c6f636d1cb39bc0a99fe9f9f
BLAKE2b-256 e0fed98841b796494c56b0eca53fdc18f2cc020dcdfeeb124159ec287306c43d

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.5.tar.gz:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.5-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.5-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 237.7 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.5-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 fff7a526a127dacf2b7859713638d9fa3373927f331cfd2f4fbbef87fe5a1b35
MD5 76455f64ebaa06869901f4cadeb80ec0
BLAKE2b-256 c1a43c135e355adbc94bf66bd96d8cd9b9536927e6be5422ab70e6422c89b589

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.5-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.5-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.5-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8da1a5d67a30619544340cb1ced1cb89a8a00710459d4acd76faa5fff453b7db
MD5 c40f11e117323b42a543c45a2e89f953
BLAKE2b-256 4f551b6c2e91cee76169522ff2d216df0bc6e00b159db22d3df8a6373bc30f93

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.5-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.5-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.5-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6ecec91dcdfe1857df008c83b9d004ad9783f7bd785c7cc3fae6c71177b4f567
MD5 6f723fcd36a1522789841b1fd4eb7635
BLAKE2b-256 02c328fa102fb0b705d774a53d22b2dd8fc533e19d0200c14c973b115c07931c

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.5-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.5-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.5-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 237.7 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.5-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2b987a470aaf864b1688ff9046fc49a7f84d94daddd4c5ced91189b91deaebf6
MD5 b8cc2c9b5e782ece1da9178c69eb65c5
BLAKE2b-256 50077d4714ead556614e25248044d24f6e6b51c8176de47604f8e954db30d670

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.5-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.5-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.5-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4e93e850fd7a35bbb42b10ad509413e4e91b6a4536587979bc5115266c6de3d7
MD5 9a3c5ac3390fcfb9b9f281a7190f3e75
BLAKE2b-256 24e72fd657969c23f7ca7b3bed5b53c3687a0fd115edf04053d96a8373b29719

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.5-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.5-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.5-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4662a1a23fca9aa1f9aa70e0c38282cebcacbdf79bb64f8f2609cf80747686ba
MD5 00b4ad8bf13d31cc5cd05002496efb2b
BLAKE2b-256 0763eb9421ab511211f5f025f07c49efdc431ac4bfeb21a7096656d42284e33a

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.5-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.5-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.5-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 237.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.5-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ba676f102253863cebc876d3f94fee1998d2905caf6b1d53112990593ca2c112
MD5 80613892505ee0573f98c8f6f463af1e
BLAKE2b-256 393ec3047fdb12e0eecaf46ccd1192c76aa75c3a020152b21a4338b6d7aa9494

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.5-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.5-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.5-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 579fd50b82311ea05ffc3404d053749ae83d0d44bf911b402e1279d90819c1ba
MD5 29e32041d026118c107ff80bfe00f480
BLAKE2b-256 f3a7684677ad4f3a36e35d1c5e5dc4814511f6866586f0d14b0df1308c463975

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.5-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 96d691af5e1cca682883703d1b1284c606df658a7c46420364c1bd4416e8e554
MD5 af0638a58eca4ccf4feb084d8648489d
BLAKE2b-256 a6bcf94263e2eae3172af648c14b7dfdd122ce50303ddbd23633a2d8a78af874

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.5-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page