Skip to main content

Super fast rust-powered RIM (raking) survey weighting with narwhals - supports polars and pandas

Project description

rimpy

Super fast rust-powered RIM (raking) survey weighting - supports both polars and pandas via Narhwlas.

Python 3.12+

Features

  • 🚀 Fast: Rust-powered engine with pure Python/NumPy fallback
  • 🔄 Backend agnostic: Works with both polars and pandas DataFrames via Narwhals
  • 📦 Lightweight: Only depends on narwhals and numpy
  • 🎯 Simple API: One function call to weight your data
  • Inspiration: Inspired by weightipy and check out their amazing work if you have more complex weighting needs

Installation

pip install rimpy

# Or with uv
uv add rimpy

# With optional dependencies
pip install rimpy[polars]  # For polars support
pip install rimpy[all]     # For both polars and pandas

Pre-built wheels are available for Linux, Windows, and macOS (arm64) on Python 3.12–3.14. The Rust engine is included automatically — no Rust toolchain needed.

Quick Start

import polars as pl
import rimpy

# Your survey data (works with pandas too!)
df = pl.DataFrame({
    "gender": [1, 1, 1, 2, 2],
    "age": [1, 2, 2, 1, 2],
})

# Define targets (percentages that should sum to 100)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 40, 2: 60},
}

# Apply weights - returns same type as input
weighted = rimpy.rake(df, targets)
print(weighted["weight"])

API Reference

rake(df, targets, **options)

Apply RIM weights to a DataFrame.

weighted = rimpy.rake(
    df,                          # polars or pandas DataFrame
    targets,                     # dict of target proportions
    max_iterations=1000,         # max iterations before stopping
    convergence_threshold=0.01,  # convergence criterion
    min_cap=None,                # minimum weight (optional)
    max_cap=None,                # maximum weight (optional)
    weight_column="weight",      # name for weight column
    drop_nulls=True,             # handle nulls (weight=1.0)
    total=None,                  # scale weighted sum to this value (optional)
    cap_correction=True,         # small epsilon on caps to prevent boundary oscillation
)

Controlled Total Base

Scale weights so the weighted sum equals a target population size:

# 500 respondents projected to a population of 50,000
weighted = rimpy.rake(df, targets, total=50_000)
weighted["weight"].sum()  # ≈ 50,000

Rows excluded from raking (e.g., nulls with drop_nulls=True) keep weight=1.0 and are not scaled.

rake_with_diagnostics(df, targets, **options)

Same as rake() but also returns diagnostics.

weighted, result = rimpy.rake_with_diagnostics(df, targets)

print(result.converged)      # True/False
print(result.iterations)     # Number of iterations
print(result.efficiency)     # Weighting efficiency (0-100%)
print(result.weight_min)     # Minimum weight
print(result.weight_max)     # Maximum weight
print(result.weight_ratio)   # Max/min ratio
print(result.summary())      # Dict of all stats

rake_by(df, targets, by, **options)

Apply weights separately within groups (same targets for all groups).

# Weight gender/age within each country
weighted = rimpy.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",  # or by=["country", "region"]
)

# With controlled total across all groups
weighted = rimpy.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",
    total=50_000,
)

rake_by_scheme(df, schemes, by, **options)

Apply different weighting schemes to different groups. Perfect for multi-country surveys!

# Each country can weight by DIFFERENT variables
country_schemes = {
    "US": {
        "gender": {1: 49, 2: 51},
        "age": {1: 20, 2: 30, 3: 30, 4: 20},
        "region": {1: 25, 2: 25, 3: 25, 4: 25},  # US weights by region
    },
    "UK": {
        "gender": {1: 49, 2: 51},
        "age": {1: 18, 2: 32, 3: 28, 4: 22},
        # UK doesn't weight by region or education
    },
    "DE": {
        "gender": {1: 48, 2: 52},
        "age": {1: 15, 2: 28, 3: 32, 4: 25},
        "education": {1: 30, 2: 40, 3: 30},  # Germany weights by education
    },
}

weighted = rimpy.rake_by_scheme(df, country_schemes, by="country")

# With diagnostics
weighted, result = rimpy.rake_by_scheme_with_diagnostics(df, country_schemes, by="country")
print(result.group_results["US"].efficiency)  # 90.0%
print(result.group_results["DE"].iterations)  # 15

Nested Weighting with group_totals

Weight within groups AND adjust group sizes to global targets:

# Weight age/gender within regions, then adjust region sizes
weighted = rimpy.rake_by_scheme(
    df,
    schemes={
        "North": {"age": {1: 15, 2: 85}, "gender": {1: 50, 2: 50}},
        "South": {"age": {1: 10, 2: 90}, "gender": {1: 48, 2: 52}},
    },
    by="region",
    group_totals={"North": 40, "South": 60},  # North=40%, South=60% of total
)

Combine with total to also control the absolute weighted base:

# Same proportions, but project to population of 10,000
weighted = rimpy.rake_by_scheme(
    df,
    schemes={...},
    by="region",
    group_totals={"North": 40, "South": 60},
    total=10_000,  # North≈4,000 + South≈6,000
)

The order of operations is: (1) rake within each group → (2) apply group_totals → (3) scale to total.

weight_summary(df, weight_col, by=None)

Summarize weight diagnostics, optionally by group.

# Overall summary
summary = rimpy.weight_summary(df, "weight")

# By country
summary = rimpy.weight_summary(df, "weight", by="country")

Returns DataFrame with:

Column Description
n Sample size
effective_n Effective sample size after weighting
efficiency_pct Weighting efficiency (0-100%)
weight_mean Mean weight (should be ~1.0)
weight_std Standard deviation of weights
weight_median Median weight
weight_min Minimum weight
weight_max Maximum weight
weight_ratio Ratio of max to min weight

validate_targets(df, targets)

Check targets for errors before weighting.

report = rimpy.validate_targets(df, targets)
print(report["errors"])    # Critical issues (will crash)
print(report["warnings"])  # Non-critical issues (informational)

validate_schemes(df, schemes, by)

Check schemes for errors before weighting with rake_by_scheme().

report = rimpy.validate_schemes(df, schemes, by="country")
print(report["_global"]["errors"])
print(report["US"]["warnings"])

Loading Schemes from Files

load_schemes(source, **options)

Load weighting schemes from a long-format table.

schemes = rimpy.load_schemes("targets.xlsx")
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

# Custom column names
schemes = rimpy.load_schemes(
    "targets.xlsx",
    key_col="country_id",
    var_col="variable",
    code_col="code",
    target_col="pct",
    sheet_name="Wave1",
)

Expected input format:

scheme_key target_var target_code target_pct
20230001 gender 1 49.85
20230001 gender 2 49.85
20230001 gender 3 0.3
20230001 smoker 1 21
20230001 smoker 2 79

load_schemes_wide(source, **options)

Load weighting schemes from a wide-format table.

schemes = rimpy.load_schemes_wide("targets.xlsx")
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

Expected input format:

target_var target_code 20230001 20240001 20230002
gender 1 49.85 49.9 49.9
gender 2 49.85 49.9 49.9
gender 3 0.3 0.2 0.2
smoker 1 21 9 10
smoker 2 79 91 90

Target Formats

rimpy accepts targets in two formats:

# Dict format (preferred)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 20, 2: 30, 3: 30, 4: 20},
}

# List format (weightipy-compatible)
targets = [
    {"gender": {1: 49, 2: 51}},
    {"age": {1: 20, 2: 30, 3: 30, 4: 20}},
]

Values can be proportions (0-1) or percentages (0-100). rimpy auto-detects.

Converting from weightipy

# weightipy format
weightipy_targets = {
    20230001: [
        {"gender": {1: 49.95, 2: 49.95, 3: 0.1}},
        {"age": {1: 32, 2: 37, 3: 31}},
    ],
}

# Convert to rimpy format
schemes = rimpy.convert_from_weightipy(weightipy_targets)
weighted = rimpy.rake_by_scheme(df, schemes, by="country_code")

Performance

rimpy uses a Rust engine (via PyO3) for the core raking loop, with automatic fallback to pure Python/NumPy on unsupported platforms. Benchmark on synthetic survey data:

Scenario Python/NumPy Rust Speedup
Small survey (n=500, 3 vars) 0.25 ms 0.03 ms 8.7x
Medium survey (n=5,000, 5 vars) 1.48 ms 0.44 ms 3.3x
Large survey (n=50,000, 5 vars) 8.66 ms 5.83 ms 1.5x
25 countries × 5,000 each (parallel) 38.40 ms 5.72 ms 6.7x

The biggest gains come from grouped raking (multi-country surveys), where Rayon parallelizes across groups.

How It Works

rimpy implements iterative proportional fitting (IPF/raking):

  1. Preprocessing: Uses narwhals for backend-agnostic DataFrame operations
  2. Index caching: Pre-computes row indices for each target category
  3. Iteration: Rust engine with zero-allocation inner loop (falls back to NumPy if needed)
  4. Parallel groups: Multi-country raking runs across CPU cores via Rayon
  5. Output: Returns DataFrame in same format as input (polars in → polars out)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rimpy-0.1.1.tar.gz (40.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rimpy-0.1.1-cp314-cp314-win_amd64.whl (235.1 kB view details)

Uploaded CPython 3.14Windows x86-64

rimpy-0.1.1-cp314-cp314-manylinux_2_34_x86_64.whl (328.2 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

rimpy-0.1.1-cp314-cp314-macosx_11_0_arm64.whl (290.4 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

rimpy-0.1.1-cp313-cp313-win_amd64.whl (235.0 kB view details)

Uploaded CPython 3.13Windows x86-64

rimpy-0.1.1-cp313-cp313-manylinux_2_34_x86_64.whl (328.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

rimpy-0.1.1-cp313-cp313-macosx_11_0_arm64.whl (290.5 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

rimpy-0.1.1-cp312-cp312-win_amd64.whl (235.3 kB view details)

Uploaded CPython 3.12Windows x86-64

rimpy-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl (328.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

rimpy-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (290.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file rimpy-0.1.1.tar.gz.

File metadata

  • Download URL: rimpy-0.1.1.tar.gz
  • Upload date:
  • Size: 40.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 132c60d70ebc3751338298c89cb2566a76b65d16b0df2b6136742fc1af9995d0
MD5 222baf42763c3198706d89702dc5ceb7
BLAKE2b-256 c7b2199be863749ba6d2ed9b95190a0b46b0d31b34efe0a71db093982baba8ac

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.1.tar.gz:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.1-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.1-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 235.1 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.1-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 2023e6a237e20d98d7125c10d67db614943647c46e14bc0c9009eff566bb9345
MD5 6c72f2f0a45a91a785f5e8a58a3701aa
BLAKE2b-256 83c79e93095b4b3687db19b9fe9cda1c69616a602c5f401106f9cae0a23ac2b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.1-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.1-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.1-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e189bb8a9e81875392e7439437985e5c4b8dcc18959b6023cf5009a2255adcd5
MD5 89d54282ad9f708581689046e17274f7
BLAKE2b-256 8d75627cd63aa621a2e6ec0cfeca87696ed35527bd4dd5768ae48ff259afa88a

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.1-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.1-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.1-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3bbb9541a683130c9e0a6728b7a9425961de1327ee0824095c01743be2477b03
MD5 0d36c16e0b4abf1154c80c1dee053b0a
BLAKE2b-256 af5defb47291fd489fcb15df1657e7390b8efeadc275db3b5d7622d9bbc854f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.1-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 235.0 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 4d467212e350c8f2fb8c16c72267cbcabbf9f8cf1423cfcbaca87533b9f3c73e
MD5 a97b831d103b44598b79755e1a8661bc
BLAKE2b-256 298d1a8a55f44a48d1949717919f159dede1eb49fe9ef7416746d804c4da4e46

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.1-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.1-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.1-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 727e7b0b6f581fd2b1fe5fe030fa842a30c3eebac1a36daf54a95b4ee6485852
MD5 aba17e67fcfdd8ca6cbaf9c03402fd57
BLAKE2b-256 a1c344ece4896f952ac2902a7f5d30395642881c683754777ae76e53a755ed41

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.1-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3818ebaa6fd43160df82beedb2154187b603a913a01d99c076b672de695219fe
MD5 bb497aa62fcbe393f6f93ce4a34b6f6b
BLAKE2b-256 16507f0ee2644ff68d4fd400054145d08a14014481e26ee29c0360236a6d49a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.1.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 235.3 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.1.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 2c8a0bef7ee878ca96c04cfe424e6b29878e6384999246eb20d9995550e1a273
MD5 04df7fd1491df1e52357211bf0775c03
BLAKE2b-256 1f205d18ed7ad71c15e0b57f8070abd807f0b3ef4f4a11ce80965767b91146a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.1-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 797255b2c5daa48f8ef9364b20c107afe82d6c1974ec04fe4d3ef71ba7a34ea3
MD5 8187facfa568cd817e78f486dde05d89
BLAKE2b-256 2017f0b10a4129096c1bcdbd6e21494b240aa4f18438d8e909a9b6ccd06ca89a

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.1-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5e43ae0da2c7970797b6d0594143095d4b3fb45160d9d69b99fcbe78ce2882d0
MD5 24f641c3457351b77ee606b71230e433
BLAKE2b-256 10b72d09fef29c997342d3670c24f6d89de646eeead85a0054301c83479224d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.1.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page