Skip to main content

Super fast rust-powered RIM (raking) survey weighting with narwhals - supports polars and pandas

Project description

rimpy

rimpy banner

Super fast rust-powered RIM (raking) survey weighting - supports both polars and pandas via Narwhals.

PyPI License: MIT Python 3.12+ Rust

Features

  • 🚀 Fast: Rust-powered Arrow engine with zero Python objects in the data path
  • 🔄 Backend agnostic: Works with both polars and pandas DataFrames via Narwhals
  • 📦 Lightweight: Only depends on narwhals (+ pyarrow for pandas users)
  • 🎯 Simple API: One function call to weight your data
  • Inspiration: Inspired by weightipy and check out their amazing work if you have more complex weighting needs

Installation

pip install rimpy

# Or with uv
uv add rimpy

# With optional dependencies
pip install rimpy[polars]  # For polars support
pip install rimpy[all]     # For both polars and pandas

Pre-built wheels are available for Linux, Windows, and macOS (arm64) on Python 3.12–3.14. The Rust engine is included automatically — no Rust toolchain needed.

Quick Start

import polars as pl
import rimpy as rim

# Your survey data (works with pandas too!)
df = pl.DataFrame({
    "gender": [1, 1, 1, 2, 2],
    "age": [1, 2, 2, 1, 2],
})

# Define targets (percentages that should sum to 100)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 40, 2: 60},
}

# Apply weights - returns same type as input
weighted = rim.rake(df, targets)
print(weighted["weight"])

Architecture

rimpy uses a three-layer Rust design:

Python API  →  Narwhals (backend-agnostic DataFrames)
                  │
                  ▼  Arrow PyCapsule
              Binding Layer (PyO3)
                  │
                  ▼
              Arrow Middleware (language-agnostic)
                  │
                  ▼
              RIM Engine (pure Rust)

The bottom two layers have zero Python dependencies — they can be reused by R, Julia, or any language with Arrow FFI support.

How It Works

df (polars/pandas) → narwhals → Arrow → RIM engine → Arrow → narwhals → df with weights

Performance

Benchmark on synthetic survey data (polars backend), zero Python objects in the hot path:

Scenario Time
Small survey (n=1,000, 3 vars) 0.17 ms
Medium survey (n=10,000, 3 vars) 0.67 ms
Large survey (n=100,000, 3 vars) 10.60 ms
Very large survey (n=1,000,000, 3 vars) 126.14 ms
Grouped raking (n=100,000, 10 groups) 14.34 ms

Grouped raking uses Rayon to parallelize across groups.

API Reference

rake(df, targets, **options)

Apply RIM weights to a DataFrame.

weighted = rim.rake(
    df,                          # polars or pandas DataFrame
    targets,                     # dict of target proportions
    max_iterations=1000,         # max iterations before stopping
    convergence_threshold=0.01,  # convergence criterion
    min_cap=None,                # minimum weight (optional)
    max_cap=None,                # maximum weight (optional)
    weight_column="weight",      # name for weight column
    drop_nulls=True,             # handle nulls (weight=1.0)
    total=None,                  # scale weighted sum to this value (optional)
    cap_correction=True,         # small epsilon on caps to prevent boundary oscillation
)

Controlled Total Base

Scale weights so the weighted sum equals a target population size:

# 500 respondents projected to a population of 50,000
weighted = rim.rake(df, targets, total=50_000)
weighted["weight"].sum()  # ≈ 50,000

Rows excluded from raking (e.g., nulls with drop_nulls=True) keep weight=1.0 and are not scaled.

rake_with_diagnostics(df, targets, **options)

Same as rake() but also returns diagnostics.

weighted, result = rim.rake_with_diagnostics(df, targets)

print(result.converged)      # True/False
print(result.iterations)     # Number of iterations
print(result.efficiency)     # Weighting efficiency (0-100%)
print(result.weight_min)     # Minimum weight
print(result.weight_max)     # Maximum weight
print(result.weight_ratio)   # Max/min ratio
print(result.summary())      # Dict of all stats

rake_by(df, targets, by, **options)

Apply weights separately within groups (same targets for all groups).

# Weight gender/age within each country
weighted = rim.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",  # or by=["country", "region"]
)

# With controlled total across all groups
weighted = rim.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",
    total=50_000,
)

rake_by_scheme(df, schemes, by, **options)

Apply different weighting schemes to different groups. Perfect for multi-country surveys!

# Each country can weight by DIFFERENT variables
country_schemes = {
    "US": {
        "gender": {1: 49, 2: 51},
        "age": {1: 20, 2: 30, 3: 30, 4: 20},
        "region": {1: 25, 2: 25, 3: 25, 4: 25},  # US weights by region
    },
    "UK": {
        "gender": {1: 49, 2: 51},
        "age": {1: 18, 2: 32, 3: 28, 4: 22},
        # UK doesn't weight by region or education
    },
    "DE": {
        "gender": {1: 48, 2: 52},
        "age": {1: 15, 2: 28, 3: 32, 4: 25},
        "education": {1: 30, 2: 40, 3: 30},  # Germany weights by education
    },
}

weighted = rim.rake_by_scheme(df, country_schemes, by="country")

# With diagnostics
weighted, result = rim.rake_by_scheme_with_diagnostics(df, country_schemes, by="country")
print(result.group_results["US"].efficiency)  # 90.0%
print(result.group_results["DE"].iterations)  # 15

Nested Weighting with group_totals

Weight within groups AND adjust group sizes to global targets:

# Weight age/gender within regions, then adjust region sizes
weighted = rim.rake_by_scheme(
    df,
    schemes={
        "North": {"age": {1: 15, 2: 85}, "gender": {1: 50, 2: 50}},
        "South": {"age": {1: 10, 2: 90}, "gender": {1: 48, 2: 52}},
    },
    by="region",
    group_totals={"North": 40, "South": 60},  # North=40%, South=60% of total
)

Combine with total to also control the absolute weighted base:

# Same proportions, but project to population of 10,000
weighted = rim.rake_by_scheme(
    df,
    schemes={...},
    by="region",
    group_totals={"North": 40, "South": 60},
    total=10_000,  # North≈4,000 + South≈6,000
)

The order of operations is: (1) rake within each group → (2) apply group_totals → (3) scale to total.

weight_summary(df, weight_col, by=None)

Summarize weight diagnostics, optionally by group.

# Overall summary
summary = rim.weight_summary(df, "weight")

# By country
summary = rim.weight_summary(df, "weight", by="country")

Returns DataFrame with:

Column Description
n Sample size
effective_n Effective sample size after weighting
efficiency_pct Weighting efficiency (0-100%)
weight_mean Mean weight (should be ~1.0)
weight_std Standard deviation of weights
weight_median Median weight
weight_min Minimum weight
weight_max Maximum weight
weight_ratio Ratio of max to min weight

validate_targets(df, targets)

Check targets for errors before weighting.

report = rim.validate_targets(df, targets)
print(report["errors"])    # Critical issues (will crash)
print(report["warnings"])  # Non-critical issues (informational)

validate_schemes(df, schemes, by)

Check schemes for errors before weighting with rake_by_scheme().

report = rim.validate_schemes(df, schemes, by="country")
print(report["_global"]["errors"])
print(report["US"]["warnings"])

Loading Schemes from Files

load_schemes(source, **options)

Load weighting schemes from a long-format table.

schemes = rim.load_schemes("targets.xlsx")
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

# Custom column names
schemes = rim.load_schemes(
    "targets.xlsx",
    key_col="country_id",
    var_col="variable",
    code_col="code",
    target_col="pct",
    sheet_name="Wave1",
)

Expected input format:

scheme_key target_var target_code target_pct
20230001 gender 1 49.85
20230001 gender 2 49.85
20230001 gender 3 0.3
20230001 smoker 1 21
20230001 smoker 2 79

load_schemes_wide(source, **options)

Load weighting schemes from a wide-format table.

schemes = rim.load_schemes_wide("targets.xlsx")
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

Expected input format:

target_var target_code 20230001 20240001 20230002
gender 1 49.85 49.9 49.9
gender 2 49.85 49.9 49.9
gender 3 0.3 0.2 0.2
smoker 1 21 9 10
smoker 2 79 91 90

Target Formats

rimpy accepts targets in two formats:

# Dict format (preferred)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 20, 2: 30, 3: 30, 4: 20},
}

# List format (weightipy-compatible)
targets = [
    {"gender": {1: 49, 2: 51}},
    {"age": {1: 20, 2: 30, 3: 30, 4: 20}},
]

Values can be proportions (0-1) or percentages (0-100). rimpy auto-detects.

Converting from weightipy

# weightipy format
weightipy_targets = {
    20230001: [
        {"gender": {1: 49.95, 2: 49.95, 3: 0.1}},
        {"age": {1: 32, 2: 37, 3: 31}},
    ],
}

# Convert to rimpy format
schemes = rim.convert_from_weightipy(weightipy_targets)
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rimpy-0.2.2.tar.gz (44.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rimpy-0.2.2-cp314-cp314-win_amd64.whl (832.7 kB view details)

Uploaded CPython 3.14Windows x86-64

rimpy-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl (864.7 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

rimpy-0.2.2-cp314-cp314-macosx_11_0_arm64.whl (756.2 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

rimpy-0.2.2-cp313-cp313-win_amd64.whl (832.7 kB view details)

Uploaded CPython 3.13Windows x86-64

rimpy-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl (878.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

rimpy-0.2.2-cp313-cp313-macosx_11_0_arm64.whl (756.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

rimpy-0.2.2-cp312-cp312-win_amd64.whl (832.9 kB view details)

Uploaded CPython 3.12Windows x86-64

rimpy-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl (878.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

rimpy-0.2.2-cp312-cp312-macosx_11_0_arm64.whl (756.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file rimpy-0.2.2.tar.gz.

File metadata

  • Download URL: rimpy-0.2.2.tar.gz
  • Upload date:
  • Size: 44.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.2.tar.gz
Algorithm Hash digest
SHA256 e1ae7ab76ef1f1a867b4ac9fa7c1a49a6adde29aa4d886c21528f8dc5858ed24
MD5 4d601f8dedd40f20b9d105fba1959f56
BLAKE2b-256 71f892603ced329957d1cddbe0ec9a184117314e3b74befeb58f5844e24cc65b

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.2.tar.gz:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.2-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.2.2-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 832.7 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.2-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 c421656deaeceaaba1b86beebfd92f4c9b380c472389d04147bba23759d9b066
MD5 2c8d6135016f7ca0ad7efbf7c2818401
BLAKE2b-256 be047bd95c41d83b89ee7599f709579678635d8d155b1bd1d9ccb2f96ea26697

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.2-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e4b0e124b298c755b63d5fdb01c0cda833c4d66ea59721d56a80ba2d640205cd
MD5 35883db3c0a386062ba208f7a1fa73de
BLAKE2b-256 9644826c2d20c806c86b672fed96177e889cc5b331a07ce97c48a2c5626c1cfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.2-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.2-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.2-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6faa27262cba895738a19a2899b764f96decd502fc2a53f8519bb358c243dbc3
MD5 5432e5054510b29a50e74a99d6d52559
BLAKE2b-256 899cd543632e0632cf46c2ee15ef384a5f4b6014c76d7640b728f7808279db69

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.2-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.2-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.2.2-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 832.7 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 0eb87f79e3cdf930ce6459c218ced661ac6941fb8f134f62c9f7273d540a1615
MD5 3d19c3113f773c999e39bd3ee59ae2b9
BLAKE2b-256 9f0d9eaa0c3526e4dae796217f447ffc8f900390321be549c2ecba1f48c6c500

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.2-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e69b210e367c19d2225139fd85778d8ab81e17b17448b5a79cb308e064fc2e98
MD5 b5a7b6d1fa421dad87ea983b4cc7b555
BLAKE2b-256 e2c786c94708051fb3949137d04ad069382f5702b0887d180212d9699d5e7c99

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.2-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b4a0895509b0b2e06648f14a951c2b88221d1471108d5a3425a5832cce8ef01e
MD5 5a3d3c8432189ee8d28ded368d50d45c
BLAKE2b-256 f88278c889dff351551da25bbf6e4df43d4df1de5e94886ba865cf4eae126874

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.2.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 832.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c9b23228de9cfd00171081a5d1dbc6df36cacaf1b7ca098a6c08b0b65187406e
MD5 08be4125e528b0e413baa19abf060b29
BLAKE2b-256 b099eae2601aa8daa93653ecfc9ebf33885b5720fde9881b3af003d3dff2b7bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.2-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 66f4d4e751157527cfeac26a78dc9823ca7a900356d5dc2b0a0a2db0ee82be58
MD5 f04c401fb66ac2c8f23568e6d8cdef54
BLAKE2b-256 7ce7639d98caab10efd74f700382fc8f569dd729bf5906fb3f75f1c302c1f197

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.2-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6cb62046f8ff449369b0187284b5b2735ea9405e7a6b4dfc0167bebacd7a86dd
MD5 9cd02bc92cd536437193a70776ae8597
BLAKE2b-256 a8c308fa98508628e708a43dc282bdc770cdb14bb257a543dce06adc9060c0ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page