Skip to main content

Super fast rust-powered RIM (raking) survey weighting with narwhals - supports polars and pandas

Project description

rimpy

Super fast rust-powered RIM (raking) survey weighting - supports both polars and pandas via Narhwlas.

Python 3.12+

Features

  • 🚀 Fast: Rust-powered engine with pure Python/NumPy fallback
  • 🔄 Backend agnostic: Works with both polars and pandas DataFrames via Narwhals
  • 📦 Lightweight: Only depends on narwhals and numpy
  • 🎯 Simple API: One function call to weight your data
  • Inspiration: Inspired by weightipy and check out their amazing work if you have more complex weighting needs

Installation

pip install rimpy

# Or with uv
uv add rimpy

# With optional dependencies
pip install rimpy[polars]  # For polars support
pip install rimpy[all]     # For both polars and pandas

Pre-built wheels are available for Linux, Windows, and macOS (arm64) on Python 3.12–3.14. The Rust engine is included automatically — no Rust toolchain needed.

Quick Start

import polars as pl
import rimpy

# Your survey data (works with pandas too!)
df = pl.DataFrame({
    "gender": [1, 1, 1, 2, 2],
    "age": [1, 2, 2, 1, 2],
})

# Define targets (percentages that should sum to 100)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 40, 2: 60},
}

# Apply weights - returns same type as input
weighted = rimpy.rake(df, targets)
print(weighted["weight"])

API Reference

rake(df, targets, **options)

Apply RIM weights to a DataFrame.

weighted = rimpy.rake(
    df,                          # polars or pandas DataFrame
    targets,                     # dict of target proportions
    max_iterations=1000,         # max iterations before stopping
    convergence_threshold=0.01,  # convergence criterion
    min_cap=None,                # minimum weight (optional)
    max_cap=None,                # maximum weight (optional)
    weight_column="weight",      # name for weight column
    drop_nulls=True,             # handle nulls (weight=1.0)
    total=None,                  # scale weighted sum to this value (optional)
    cap_correction=True,         # small epsilon on caps to prevent boundary oscillation
)

Controlled Total Base

Scale weights so the weighted sum equals a target population size:

# 500 respondents projected to a population of 50,000
weighted = rimpy.rake(df, targets, total=50_000)
weighted["weight"].sum()  # ≈ 50,000

Rows excluded from raking (e.g., nulls with drop_nulls=True) keep weight=1.0 and are not scaled.

rake_with_diagnostics(df, targets, **options)

Same as rake() but also returns diagnostics.

weighted, result = rimpy.rake_with_diagnostics(df, targets)

print(result.converged)      # True/False
print(result.iterations)     # Number of iterations
print(result.efficiency)     # Weighting efficiency (0-100%)
print(result.weight_min)     # Minimum weight
print(result.weight_max)     # Maximum weight
print(result.weight_ratio)   # Max/min ratio
print(result.summary())      # Dict of all stats

rake_by(df, targets, by, **options)

Apply weights separately within groups (same targets for all groups).

# Weight gender/age within each country
weighted = rimpy.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",  # or by=["country", "region"]
)

# With controlled total across all groups
weighted = rimpy.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",
    total=50_000,
)

rake_by_scheme(df, schemes, by, **options)

Apply different weighting schemes to different groups. Perfect for multi-country surveys!

# Each country can weight by DIFFERENT variables
country_schemes = {
    "US": {
        "gender": {1: 49, 2: 51},
        "age": {1: 20, 2: 30, 3: 30, 4: 20},
        "region": {1: 25, 2: 25, 3: 25, 4: 25},  # US weights by region
    },
    "UK": {
        "gender": {1: 49, 2: 51},
        "age": {1: 18, 2: 32, 3: 28, 4: 22},
        # UK doesn't weight by region or education
    },
    "DE": {
        "gender": {1: 48, 2: 52},
        "age": {1: 15, 2: 28, 3: 32, 4: 25},
        "education": {1: 30, 2: 40, 3: 30},  # Germany weights by education
    },
}

weighted = rimpy.rake_by_scheme(df, country_schemes, by="country")

# With diagnostics
weighted, result = rimpy.rake_by_scheme_with_diagnostics(df, country_schemes, by="country")
print(result.group_results["US"].efficiency)  # 90.0%
print(result.group_results["DE"].iterations)  # 15

Nested Weighting with group_totals

Weight within groups AND adjust group sizes to global targets:

# Weight age/gender within regions, then adjust region sizes
weighted = rimpy.rake_by_scheme(
    df,
    schemes={
        "North": {"age": {1: 15, 2: 85}, "gender": {1: 50, 2: 50}},
        "South": {"age": {1: 10, 2: 90}, "gender": {1: 48, 2: 52}},
    },
    by="region",
    group_totals={"North": 40, "South": 60},  # North=40%, South=60% of total
)

Combine with total to also control the absolute weighted base:

# Same proportions, but project to population of 10,000
weighted = rimpy.rake_by_scheme(
    df,
    schemes={...},
    by="region",
    group_totals={"North": 40, "South": 60},
    total=10_000,  # North≈4,000 + South≈6,000
)

The order of operations is: (1) rake within each group → (2) apply group_totals → (3) scale to total.

weight_summary(df, weight_col, by=None)

Summarize weight diagnostics, optionally by group.

# Overall summary
summary = rimpy.weight_summary(df, "weight")

# By country
summary = rimpy.weight_summary(df, "weight", by="country")

Returns DataFrame with:

Column Description
n Sample size
effective_n Effective sample size after weighting
efficiency_pct Weighting efficiency (0-100%)
weight_mean Mean weight (should be ~1.0)
weight_std Standard deviation of weights
weight_median Median weight
weight_min Minimum weight
weight_max Maximum weight
weight_ratio Ratio of max to min weight

validate_targets(df, targets)

Check targets for errors before weighting.

report = rimpy.validate_targets(df, targets)
print(report["errors"])    # Critical issues (will crash)
print(report["warnings"])  # Non-critical issues (informational)

validate_schemes(df, schemes, by)

Check schemes for errors before weighting with rake_by_scheme().

report = rimpy.validate_schemes(df, schemes, by="country")
print(report["_global"]["errors"])
print(report["US"]["warnings"])

Loading Schemes from Files

load_schemes(source, **options)

Load weighting schemes from a long-format table.

schemes = rimpy.load_schemes("targets.xlsx")
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

# Custom column names
schemes = rimpy.load_schemes(
    "targets.xlsx",
    key_col="country_id",
    var_col="variable",
    code_col="code",
    target_col="pct",
    sheet_name="Wave1",
)

Expected input format:

scheme_key target_var target_code target_pct
20230001 gender 1 49.85
20230001 gender 2 49.85
20230001 gender 3 0.3
20230001 smoker 1 21
20230001 smoker 2 79

load_schemes_wide(source, **options)

Load weighting schemes from a wide-format table.

schemes = rimpy.load_schemes_wide("targets.xlsx")
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

Expected input format:

target_var target_code 20230001 20240001 20230002
gender 1 49.85 49.9 49.9
gender 2 49.85 49.9 49.9
gender 3 0.3 0.2 0.2
smoker 1 21 9 10
smoker 2 79 91 90

Target Formats

rimpy accepts targets in two formats:

# Dict format (preferred)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 20, 2: 30, 3: 30, 4: 20},
}

# List format (weightipy-compatible)
targets = [
    {"gender": {1: 49, 2: 51}},
    {"age": {1: 20, 2: 30, 3: 30, 4: 20}},
]

Values can be proportions (0-1) or percentages (0-100). rimpy auto-detects.

Converting from weightipy

# weightipy format
weightipy_targets = {
    20230001: [
        {"gender": {1: 49.95, 2: 49.95, 3: 0.1}},
        {"age": {1: 32, 2: 37, 3: 31}},
    ],
}

# Convert to rimpy format
schemes = rimpy.convert_from_weightipy(weightipy_targets)
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

Performance

rimpy uses a Rust engine (via PyO3) for the core raking loop, with automatic fallback to pure Python/NumPy on unsupported platforms. Benchmark on synthetic survey data:

Scenario Python/NumPy Rust Speedup
Small survey (n=500, 3 vars) 0.25 ms 0.03 ms 8.7x
Medium survey (n=5,000, 5 vars) 1.48 ms 0.44 ms 3.3x
Large survey (n=50,000, 5 vars) 8.66 ms 5.83 ms 1.5x
25 countries × 5,000 each (parallel) 38.40 ms 5.72 ms 6.7x

The biggest gains come from grouped raking (multi-country surveys), where Rayon parallelizes across groups.

How It Works

rimpy implements iterative proportional fitting (IPF/raking):

  1. Preprocessing: Uses narwhals for backend-agnostic DataFrame operations
  2. Index caching: Pre-computes row indices for each target category
  3. Iteration: Rust engine with zero-allocation inner loop (falls back to NumPy if needed)
  4. Parallel groups: Multi-country raking runs across CPU cores via Rayon
  5. Output: Returns DataFrame in same format as input (polars in → polars out)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rimpy-0.1.3.tar.gz (40.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rimpy-0.1.3-cp314-cp314-win_amd64.whl (235.1 kB view details)

Uploaded CPython 3.14Windows x86-64

rimpy-0.1.3-cp314-cp314-manylinux_2_34_x86_64.whl (328.2 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

rimpy-0.1.3-cp314-cp314-macosx_11_0_arm64.whl (290.4 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

rimpy-0.1.3-cp313-cp313-win_amd64.whl (235.0 kB view details)

Uploaded CPython 3.13Windows x86-64

rimpy-0.1.3-cp313-cp313-manylinux_2_34_x86_64.whl (328.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

rimpy-0.1.3-cp313-cp313-macosx_11_0_arm64.whl (290.5 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

rimpy-0.1.3-cp312-cp312-win_amd64.whl (235.3 kB view details)

Uploaded CPython 3.12Windows x86-64

rimpy-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl (328.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

rimpy-0.1.3-cp312-cp312-macosx_11_0_arm64.whl (290.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file rimpy-0.1.3.tar.gz.

File metadata

  • Download URL: rimpy-0.1.3.tar.gz
  • Upload date:
  • Size: 40.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.3.tar.gz
Algorithm Hash digest
SHA256 4c50617bb10185a32c70ef277ce43da7ed3974bfaeeeb017b8ff37240c3abf6c
MD5 8bc697fd5a6c02187b2b51c8769d1814
BLAKE2b-256 b52ca2680295ef7797d5f7228724ad75d847ca904b3b5598c0f96f436f75df8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.3.tar.gz:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.3-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.3-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 235.1 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.3-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 8a4e070749f0d8cae44fdaaeb5a25488913e410931436de9a063d6bade05fe37
MD5 6951a51e580cbf0695d7750993b31ca6
BLAKE2b-256 c7a2fda7de2cf6ee3bc7368f99c547006ec2a2b6653adc7c611993896dfcb524

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.3-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.3-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.3-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 2d1fa91d39a34bcddf2d1152bd3cd87871111809907e9fb9eefbdc6eac0b4a76
MD5 3d0b3b1e800aec0d480aed512f651385
BLAKE2b-256 6ffcec8dabc1a30d068524fea878c09059fff8f06493920c88b4cffb22d7d1f2

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.3-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.3-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.3-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5bae196eeffacf5ade9ec2696a511376e8ad82f39f5af068b26028854e3109f1
MD5 faa778b4da7a507868b8741e3e8c18bc
BLAKE2b-256 43660af051bf1cb24ec5ae18b1deca3d1905815b77af955e9fdec2bf9de74bcf

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.3-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.3-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.3-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 235.0 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.3-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 0e1f6726322b91e447bf546fffaceac066df53dab898ff83848820c382a12b60
MD5 933c580d96e5b36a5ce6148bc3808ee8
BLAKE2b-256 5389e6451ec4b8e32f3120800a3ad482b126136cac831e673fd77c97c4ea96f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.3-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.3-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.3-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 efbc27fae738ae548910c5d87d270956bd2a9af94688ca6e439230db917247e7
MD5 315360cfea6ffa7ee99188efa1a5a30f
BLAKE2b-256 2fc1bf8d9fb2b3f1ac2f4113991656ae2c1e37fadea59f29ce9cd5bc4bed3050

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.3-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cfb849c42cf29e34faa81675d2189e1bb626525d5b033abd7715e038f63d5a30
MD5 24099e505fb3efb5ec17174963f967c2
BLAKE2b-256 3710633edf4ee5213f771ada3742f52557b5f180d826b6ad4e8939fcc8df3f74

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.3-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 235.3 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 99ad05126a31cdbe0fd3a1f3a2acb67d5309c789236cb4c01d4b59992b4b37a0
MD5 99ea573f42ea4bb34497d3a3caf0435d
BLAKE2b-256 d9111402b876e039ca3c6b3a2a5bd458178cde0324da36b2d7702786ca889fc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.3-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 32ea7264e3388c873e76a566b6f0169080bfe80e021c9f1c31dbd9abdb9e8b3c
MD5 4cc0972b0fe55ff31f90a33a879985a3
BLAKE2b-256 4fad3fd54766b55a16ed049a3172794d8a647a98d0c15b70c6f62e3400e735e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3e1a60687c38f9aec30a7a1385614b1476d0f34667b0174e207eb488f2812ab9
MD5 7330c081d4b0c8e864c7b445c76497c8
BLAKE2b-256 8bde47a14ecfd327820f5261e82ae38cf5e93ffb5792a94f6e8c28b98172667c

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.3-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page