Skip to main content

Super fast rust-powered RIM (raking) survey weighting with narwhals - supports polars and pandas

Project description

rimpy

Super fast rust-powered RIM (raking) survey weighting - supports both polars and pandas via Narhwlas.

Python 3.12+

Features

  • 🚀 Fast: Rust-powered engine with pure Python/NumPy fallback
  • 🔄 Backend agnostic: Works with both polars and pandas DataFrames via Narwhals
  • 📦 Lightweight: Only depends on narwhals and numpy
  • 🎯 Simple API: One function call to weight your data
  • Inspiration: Inspired by weightipy and check out their amazing work if you have more complex weighting needs

Installation

pip install rimpy

# Or with uv
uv add rimpy

# With optional dependencies
pip install rimpy[polars]  # For polars support
pip install rimpy[all]     # For both polars and pandas

Pre-built wheels are available for Linux, Windows, and macOS (arm64) on Python 3.12–3.14. The Rust engine is included automatically — no Rust toolchain needed.

Quick Start

import polars as pl
import rimpy

# Your survey data (works with pandas too!)
df = pl.DataFrame({
    "gender": [1, 1, 1, 2, 2],
    "age": [1, 2, 2, 1, 2],
})

# Define targets (percentages that should sum to 100)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 40, 2: 60},
}

# Apply weights - returns same type as input
weighted = rimpy.rake(df, targets)
print(weighted["weight"])

API Reference

rake(df, targets, **options)

Apply RIM weights to a DataFrame.

weighted = rimpy.rake(
    df,                          # polars or pandas DataFrame
    targets,                     # dict of target proportions
    max_iterations=1000,         # max iterations before stopping
    convergence_threshold=0.01,  # convergence criterion
    min_cap=None,                # minimum weight (optional)
    max_cap=None,                # maximum weight (optional)
    weight_column="weight",      # name for weight column
    drop_nulls=True,             # handle nulls (weight=1.0)
    total=None,                  # scale weighted sum to this value (optional)
    cap_correction=True,         # small epsilon on caps to prevent boundary oscillation
)

Controlled Total Base

Scale weights so the weighted sum equals a target population size:

# 500 respondents projected to a population of 50,000
weighted = rimpy.rake(df, targets, total=50_000)
weighted["weight"].sum()  # ≈ 50,000

Rows excluded from raking (e.g., nulls with drop_nulls=True) keep weight=1.0 and are not scaled.

rake_with_diagnostics(df, targets, **options)

Same as rake() but also returns diagnostics.

weighted, result = rimpy.rake_with_diagnostics(df, targets)

print(result.converged)      # True/False
print(result.iterations)     # Number of iterations
print(result.efficiency)     # Weighting efficiency (0-100%)
print(result.weight_min)     # Minimum weight
print(result.weight_max)     # Maximum weight
print(result.weight_ratio)   # Max/min ratio
print(result.summary())      # Dict of all stats

rake_by(df, targets, by, **options)

Apply weights separately within groups (same targets for all groups).

# Weight gender/age within each country
weighted = rimpy.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",  # or by=["country", "region"]
)

# With controlled total across all groups
weighted = rimpy.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",
    total=50_000,
)

rake_by_scheme(df, schemes, by, **options)

Apply different weighting schemes to different groups. Perfect for multi-country surveys!

# Each country can weight by DIFFERENT variables
country_schemes = {
    "US": {
        "gender": {1: 49, 2: 51},
        "age": {1: 20, 2: 30, 3: 30, 4: 20},
        "region": {1: 25, 2: 25, 3: 25, 4: 25},  # US weights by region
    },
    "UK": {
        "gender": {1: 49, 2: 51},
        "age": {1: 18, 2: 32, 3: 28, 4: 22},
        # UK doesn't weight by region or education
    },
    "DE": {
        "gender": {1: 48, 2: 52},
        "age": {1: 15, 2: 28, 3: 32, 4: 25},
        "education": {1: 30, 2: 40, 3: 30},  # Germany weights by education
    },
}

weighted = rimpy.rake_by_scheme(df, country_schemes, by="country")

# With diagnostics
weighted, result = rimpy.rake_by_scheme_with_diagnostics(df, country_schemes, by="country")
print(result.group_results["US"].efficiency)  # 90.0%
print(result.group_results["DE"].iterations)  # 15

Nested Weighting with group_totals

Weight within groups AND adjust group sizes to global targets:

# Weight age/gender within regions, then adjust region sizes
weighted = rimpy.rake_by_scheme(
    df,
    schemes={
        "North": {"age": {1: 15, 2: 85}, "gender": {1: 50, 2: 50}},
        "South": {"age": {1: 10, 2: 90}, "gender": {1: 48, 2: 52}},
    },
    by="region",
    group_totals={"North": 40, "South": 60},  # North=40%, South=60% of total
)

Combine with total to also control the absolute weighted base:

# Same proportions, but project to population of 10,000
weighted = rimpy.rake_by_scheme(
    df,
    schemes={...},
    by="region",
    group_totals={"North": 40, "South": 60},
    total=10_000,  # North≈4,000 + South≈6,000
)

The order of operations is: (1) rake within each group → (2) apply group_totals → (3) scale to total.

weight_summary(df, weight_col, by=None)

Summarize weight diagnostics, optionally by group.

# Overall summary
summary = rimpy.weight_summary(df, "weight")

# By country
summary = rimpy.weight_summary(df, "weight", by="country")

Returns DataFrame with:

Column Description
n Sample size
effective_n Effective sample size after weighting
efficiency_pct Weighting efficiency (0-100%)
weight_mean Mean weight (should be ~1.0)
weight_std Standard deviation of weights
weight_median Median weight
weight_min Minimum weight
weight_max Maximum weight
weight_ratio Ratio of max to min weight

validate_targets(df, targets)

Check targets for errors before weighting.

report = rimpy.validate_targets(df, targets)
print(report["errors"])    # Critical issues (will crash)
print(report["warnings"])  # Non-critical issues (informational)

validate_schemes(df, schemes, by)

Check schemes for errors before weighting with rake_by_scheme().

report = rimpy.validate_schemes(df, schemes, by="country")
print(report["_global"]["errors"])
print(report["US"]["warnings"])

Loading Schemes from Files

load_schemes(source, **options)

Load weighting schemes from a long-format table.

schemes = rimpy.load_schemes("targets.xlsx")
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

# Custom column names
schemes = rimpy.load_schemes(
    "targets.xlsx",
    key_col="country_id",
    var_col="variable",
    code_col="code",
    target_col="pct",
    sheet_name="Wave1",
)

Expected input format:

scheme_key target_var target_code target_pct
20230001 gender 1 49.85
20230001 gender 2 49.85
20230001 gender 3 0.3
20230001 smoker 1 21
20230001 smoker 2 79

load_schemes_wide(source, **options)

Load weighting schemes from a wide-format table.

schemes = rimpy.load_schemes_wide("targets.xlsx")
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

Expected input format:

target_var target_code 20230001 20240001 20230002
gender 1 49.85 49.9 49.9
gender 2 49.85 49.9 49.9
gender 3 0.3 0.2 0.2
smoker 1 21 9 10
smoker 2 79 91 90

Target Formats

rimpy accepts targets in two formats:

# Dict format (preferred)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 20, 2: 30, 3: 30, 4: 20},
}

# List format (weightipy-compatible)
targets = [
    {"gender": {1: 49, 2: 51}},
    {"age": {1: 20, 2: 30, 3: 30, 4: 20}},
]

Values can be proportions (0-1) or percentages (0-100). rimpy auto-detects.

Converting from weightipy

# weightipy format
weightipy_targets = {
    20230001: [
        {"gender": {1: 49.95, 2: 49.95, 3: 0.1}},
        {"age": {1: 32, 2: 37, 3: 31}},
    ],
}

# Convert to rimpy format
schemes = rimpy.convert_from_weightipy(weightipy_targets)
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

Performance

rimpy uses a Rust engine (via PyO3) for the core raking loop, with automatic fallback to pure Python/NumPy on unsupported platforms. Benchmark on synthetic survey data:

Scenario Python/NumPy Rust Speedup
Small survey (n=500, 3 vars) 0.25 ms 0.03 ms 8.7x
Medium survey (n=5,000, 5 vars) 1.48 ms 0.44 ms 3.3x
Large survey (n=50,000, 5 vars) 8.66 ms 5.83 ms 1.5x
25 countries × 5,000 each (parallel) 38.40 ms 5.72 ms 6.7x

The biggest gains come from grouped raking (multi-country surveys), where Rayon parallelizes across groups.

How It Works

rimpy implements iterative proportional fitting (IPF/raking):

  1. Preprocessing: Uses narwhals for backend-agnostic DataFrame operations
  2. Index caching: Pre-computes row indices for each target category
  3. Iteration: Rust engine with zero-allocation inner loop (falls back to NumPy if needed)
  4. Parallel groups: Multi-country raking runs across CPU cores via Rayon
  5. Output: Returns DataFrame in same format as input (polars in → polars out)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rimpy-0.1.2.tar.gz (40.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rimpy-0.1.2-cp314-cp314-win_amd64.whl (235.1 kB view details)

Uploaded CPython 3.14Windows x86-64

rimpy-0.1.2-cp314-cp314-manylinux_2_34_x86_64.whl (328.2 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

rimpy-0.1.2-cp314-cp314-macosx_11_0_arm64.whl (290.4 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

rimpy-0.1.2-cp313-cp313-win_amd64.whl (235.0 kB view details)

Uploaded CPython 3.13Windows x86-64

rimpy-0.1.2-cp313-cp313-manylinux_2_34_x86_64.whl (328.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

rimpy-0.1.2-cp313-cp313-macosx_11_0_arm64.whl (290.5 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

rimpy-0.1.2-cp312-cp312-win_amd64.whl (235.3 kB view details)

Uploaded CPython 3.12Windows x86-64

rimpy-0.1.2-cp312-cp312-manylinux_2_34_x86_64.whl (328.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

rimpy-0.1.2-cp312-cp312-macosx_11_0_arm64.whl (290.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file rimpy-0.1.2.tar.gz.

File metadata

  • Download URL: rimpy-0.1.2.tar.gz
  • Upload date:
  • Size: 40.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.2.tar.gz
Algorithm Hash digest
SHA256 57ec62885b29893bc0f2998d5e9f6b924d9deff45330c84b4a499f1dcdbe9649
MD5 95c8d91818adcbd7a7ce8e96bc577446
BLAKE2b-256 9de5f99c9bbd2f87a38698c1e8d6ccf6f21b4fccf18f61a10bdb86f6946fdb83

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.2.tar.gz:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.2-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.2-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 235.1 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.2-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 419bbff903201fb99c72787971b823e2cc2319ff779187806d19d7275daf98e7
MD5 eb16aa9a0de8c9dfe84a0732c8a575d5
BLAKE2b-256 58c6d7db6151d59557760bf9d6097e8d3ff82982acd2022715121c01ed06c7e9

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.2-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.2-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.2-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8e24ec5f095d9dbe9f31107fc94a6c0b3c874d105b3d8788d16c6e5957211270
MD5 07eece4dd03c470ca5214f9abe5402a8
BLAKE2b-256 2419f431eda92c31e1d15ecfbe302fc3d81f146cd2af1baad36da0fd70996643

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.2-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.2-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.2-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7eec861afe52690f2c85c1ae3ac924748f18d9af6fb84355da73ab2749aa83ea
MD5 c5372c6e751f3c0eb368b5062bc5ea9b
BLAKE2b-256 217fe7e2c6d901cd49a84ff1862c37b32acbad6712dfcd97cd70427dc684fbd1

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.2-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.2-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.2-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 235.0 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 3d5f5d3e6e8e194aa76da2bda41b823c2ba43833f53202cb3e7e1f4a2ce106f1
MD5 284e04d1521f1ac5a8cbeeab1dcb2b32
BLAKE2b-256 f21b38599ef1d666bd43ee2a1565c857a67a5cd2700a6641080cdc740a862f4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.2-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.2-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.2-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 14de70d4ac7a4c4c39216452a97a4e14aaf250fb8fb8853c94352fabb89c1369
MD5 3572bb71e781fd6673378a34e2bba4bf
BLAKE2b-256 a05dfdbda15878ec5455f6001d8614eb3b6d01d39858ff9669ed555854050d62

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.2-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3dcb91ea65cdc02fb972b1e7a381c068be314e19dc9088ad4fc710c2fb221224
MD5 73f32ab88467ddade166b5c81e5c8b7f
BLAKE2b-256 1691877cf0ea3bfddc67b00dcea47ff0161b096c54b7731c7bdb99ba13110a22

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 235.3 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e77930aaae255f9658dc577647fb74b2fb941bf5af7c2e710d873d26c19529f8
MD5 64db61c875c2a73d2b641bbf5a7258e3
BLAKE2b-256 a8320323146200f41423e0c176e7d3613d6f21516edca5dd44a98ef95eab32ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.2-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.2-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.2-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 38e25bd43d9c4a117b387b30ac4386789f66d88f6e3403bd1cebc3d988be7939
MD5 ed34afec934927e2d961ef18969bc41a
BLAKE2b-256 9acc87b6fc305b6e72707e007f00fc9b88bffbfe69413f00213aced89ea31b0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.2-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 95f1c670bfa94b371b3e023b8b3c2cf3730c9cfd6a6fb42b7dd927daafdb7efe
MD5 96b2267dba68a752753f66eec60d5f26
BLAKE2b-256 727aa4fec3267144cfd2637f889d83d6c3d1793d3a4f9cf80fd3c7b136a388aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page