Skip to main content

Super fast rust-powered RIM (raking) survey weighting with narwhals - supports polars and pandas

Project description

rimpy

rimpy banner

Super fast rust-powered RIM (raking) survey weighting - supports both polars and pandas via Narwhals.

PyPI License: MIT Python 3.12+ Rust

Features

  • 🚀 Fast: Rust-powered Arrow engine with zero Python objects in the data path
  • 🔄 Backend agnostic: Works with both polars and pandas DataFrames via Narwhals
  • 📦 Lightweight: Only depends on narwhals (+ pyarrow for pandas users)
  • 🎯 Simple API: One function call to weight your data
  • Inspiration: Inspired by weightipy and check out their amazing work if you have more complex weighting needs

Installation

pip install rimpy

# Or with uv
uv add rimpy

# With optional dependencies
pip install rimpy[polars]  # For polars support
pip install rimpy[all]     # For both polars and pandas

Pre-built wheels are available for Linux, Windows, and macOS (arm64) on Python 3.12–3.14. The Rust engine is included automatically — no Rust toolchain needed.

Quick Start

import polars as pl
import rimpy as rim

# Your survey data (works with pandas too!)
df = pl.DataFrame({
    "gender": [1, 1, 1, 2, 2],
    "age": [1, 2, 2, 1, 2],
})

# Define targets (percentages that should sum to 100)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 40, 2: 60},
}

# Apply weights - returns same type as input
weighted = rim.rake(df, targets)
print(weighted["weight"])

Architecture

rimpy uses a three-layer Rust design:

Python API  →  Narwhals (backend-agnostic DataFrames)
                  │
                  ▼  Arrow PyCapsule
              Binding Layer (PyO3)
                  │
                  ▼
              Arrow Middleware (language-agnostic)
                  │
                  ▼
              RIM Engine (pure Rust)

The bottom two layers have zero Python dependencies — they can be reused by R, Julia, or any language with Arrow FFI support.

How It Works

df (polars/pandas) → narwhals → Arrow → RIM engine → Arrow → narwhals → df with weights

Performance

Benchmark on synthetic survey data (polars backend), zero Python objects in the hot path:

Scenario Time
Small survey (n=1,000, 3 vars) 0.17 ms
Medium survey (n=10,000, 3 vars) 0.67 ms
Large survey (n=100,000, 3 vars) 10.60 ms
Very large survey (n=1,000,000, 3 vars) 126.14 ms
Grouped raking (n=100,000, 10 groups) 14.34 ms

Grouped raking uses Rayon to parallelize across groups.

API Reference

rake(df, targets, **options)

Apply RIM weights to a DataFrame.

weighted = rim.rake(
    df,                          # polars or pandas DataFrame
    targets,                     # dict of target proportions
    max_iterations=1000,         # max iterations before stopping
    convergence_threshold=0.01,  # convergence criterion
    min_cap=None,                # minimum weight (optional)
    max_cap=None,                # maximum weight (optional)
    weight_column="weight",      # name for weight column
    drop_nulls=True,             # handle nulls (weight=1.0)
    total=None,                  # scale weighted sum to this value (optional)
    cap_correction=True,         # small epsilon on caps to prevent boundary oscillation
)

Controlled Total Base

Scale weights so the weighted sum equals a target population size:

# 500 respondents projected to a population of 50,000
weighted = rim.rake(df, targets, total=50_000)
weighted["weight"].sum()  # ≈ 50,000

Rows excluded from raking (e.g., nulls with drop_nulls=True) keep weight=1.0 and are not scaled.

rake_with_diagnostics(df, targets, **options)

Same as rake() but also returns diagnostics.

weighted, result = rim.rake_with_diagnostics(df, targets)

print(result.converged)      # True/False
print(result.iterations)     # Number of iterations
print(result.efficiency)     # Weighting efficiency (0-100%)
print(result.weight_min)     # Minimum weight
print(result.weight_max)     # Maximum weight
print(result.weight_ratio)   # Max/min ratio
print(result.summary())      # Dict of all stats

rake_by(df, targets, by, **options)

Apply weights separately within groups (same targets for all groups).

# Weight gender/age within each country
weighted = rim.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",  # or by=["country", "region"]
)

# With controlled total across all groups
weighted = rim.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",
    total=50_000,
)

rake_by_scheme(df, schemes, by, **options)

Apply different weighting schemes to different groups. Perfect for multi-country surveys!

# Each country can weight by DIFFERENT variables
country_schemes = {
    "US": {
        "gender": {1: 49, 2: 51},
        "age": {1: 20, 2: 30, 3: 30, 4: 20},
        "region": {1: 25, 2: 25, 3: 25, 4: 25},  # US weights by region
    },
    "UK": {
        "gender": {1: 49, 2: 51},
        "age": {1: 18, 2: 32, 3: 28, 4: 22},
        # UK doesn't weight by region or education
    },
    "DE": {
        "gender": {1: 48, 2: 52},
        "age": {1: 15, 2: 28, 3: 32, 4: 25},
        "education": {1: 30, 2: 40, 3: 30},  # Germany weights by education
    },
}

weighted = rim.rake_by_scheme(df, country_schemes, by="country")

# With diagnostics
weighted, result = rim.rake_by_scheme_with_diagnostics(df, country_schemes, by="country")
print(result.group_results["US"].efficiency)  # 90.0%
print(result.group_results["DE"].iterations)  # 15

Nested Weighting with group_totals

Weight within groups AND adjust group sizes to global targets:

# Weight age/gender within regions, then adjust region sizes
weighted = rim.rake_by_scheme(
    df,
    schemes={
        "North": {"age": {1: 15, 2: 85}, "gender": {1: 50, 2: 50}},
        "South": {"age": {1: 10, 2: 90}, "gender": {1: 48, 2: 52}},
    },
    by="region",
    group_totals={"North": 40, "South": 60},  # North=40%, South=60% of total
)

Combine with total to also control the absolute weighted base:

# Same proportions, but project to population of 10,000
weighted = rim.rake_by_scheme(
    df,
    schemes={...},
    by="region",
    group_totals={"North": 40, "South": 60},
    total=10_000,  # North≈4,000 + South≈6,000
)

The order of operations is: (1) rake within each group → (2) apply group_totals → (3) scale to total.

weight_summary(df, weight_col, by=None)

Summarize weight diagnostics, optionally by group.

# Overall summary
summary = rim.weight_summary(df, "weight")

# By country
summary = rim.weight_summary(df, "weight", by="country")

Returns DataFrame with:

Column Description
n Sample size
effective_n Effective sample size after weighting
efficiency_pct Weighting efficiency (0-100%)
weight_mean Mean weight (should be ~1.0)
weight_std Standard deviation of weights
weight_median Median weight
weight_min Minimum weight
weight_max Maximum weight
weight_ratio Ratio of max to min weight

validate_targets(df, targets)

Check targets for errors before weighting.

report = rim.validate_targets(df, targets)
print(report["errors"])    # Critical issues (will crash)
print(report["warnings"])  # Non-critical issues (informational)

validate_schemes(df, schemes, by)

Check schemes for errors before weighting with rake_by_scheme().

report = rim.validate_schemes(df, schemes, by="country")
print(report["_global"]["errors"])
print(report["US"]["warnings"])

Loading Schemes from Files

load_schemes(source, **options)

Load weighting schemes from a long-format table.

schemes = rim.load_schemes("targets.xlsx")
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

# Custom column names
schemes = rim.load_schemes(
    "targets.xlsx",
    key_col="country_id",
    var_col="variable",
    code_col="code",
    target_col="pct",
    sheet_name="Wave1",
)

Expected input format:

scheme_key target_var target_code target_pct
20230001 gender 1 49.85
20230001 gender 2 49.85
20230001 gender 3 0.3
20230001 smoker 1 21
20230001 smoker 2 79

load_schemes_wide(source, **options)

Load weighting schemes from a wide-format table.

schemes = rim.load_schemes_wide("targets.xlsx")
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

Expected input format:

target_var target_code 20230001 20240001 20230002
gender 1 49.85 49.9 49.9
gender 2 49.85 49.9 49.9
gender 3 0.3 0.2 0.2
smoker 1 21 9 10
smoker 2 79 91 90

Target Formats

rimpy accepts targets in two formats:

# Dict format (preferred)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 20, 2: 30, 3: 30, 4: 20},
}

# List format (weightipy-compatible)
targets = [
    {"gender": {1: 49, 2: 51}},
    {"age": {1: 20, 2: 30, 3: 30, 4: 20}},
]

Values can be proportions (0-1) or percentages (0-100). rimpy auto-detects.

Converting from weightipy

# weightipy format
weightipy_targets = {
    20230001: [
        {"gender": {1: 49.95, 2: 49.95, 3: 0.1}},
        {"age": {1: 32, 2: 37, 3: 31}},
    ],
}

# Convert to rimpy format
schemes = rim.convert_from_weightipy(weightipy_targets)
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rimpy-0.2.1.tar.gz (44.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rimpy-0.2.1-cp314-cp314-win_amd64.whl (832.6 kB view details)

Uploaded CPython 3.14Windows x86-64

rimpy-0.2.1-cp314-cp314-manylinux_2_34_x86_64.whl (878.4 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

rimpy-0.2.1-cp314-cp314-macosx_11_0_arm64.whl (756.1 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

rimpy-0.2.1-cp313-cp313-win_amd64.whl (811.1 kB view details)

Uploaded CPython 3.13Windows x86-64

rimpy-0.2.1-cp313-cp313-manylinux_2_34_x86_64.whl (878.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

rimpy-0.2.1-cp313-cp313-macosx_11_0_arm64.whl (756.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

rimpy-0.2.1-cp312-cp312-win_amd64.whl (833.5 kB view details)

Uploaded CPython 3.12Windows x86-64

rimpy-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl (878.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

rimpy-0.2.1-cp312-cp312-macosx_11_0_arm64.whl (756.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file rimpy-0.2.1.tar.gz.

File metadata

  • Download URL: rimpy-0.2.1.tar.gz
  • Upload date:
  • Size: 44.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.1.tar.gz
Algorithm Hash digest
SHA256 af10260e0c3ffa30ac19def21891ea6f1d651446d41249cb9d5b3271b6661b3d
MD5 ad3a0cbfe539cb3ad716db31e6c6cacb
BLAKE2b-256 c2cbf50c7ea291075922cd6528a3edacdd65313856d652a9e62c8064f41faa6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.1.tar.gz:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.1-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.2.1-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 832.6 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.1-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 ff0e6f3ac0b6932c1aab45258197e2adc380a8152ae434784fa8ab176ef44d95
MD5 8b88f302b3fcb1f36ca47a94a29c84e8
BLAKE2b-256 f4642b4c689288c96268dda7d6d376553d87ac21b6bb42930622699a24df5fdb

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.1-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.1-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.1-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 63160b21428644c6caa243f8a2d7f4461aca9165bf5833341e755f85f87fb59f
MD5 1e6a532b2b3c7386ffef174bba6ba362
BLAKE2b-256 f14cd3a5a111afed6b8bae40aa641a2491097a5953f7d4b678b49f0875c1016e

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.1-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.1-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.1-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 19284a22dff5282d8556e8f85189124eb295d6ecc4950386f7fe9846f2998926
MD5 bd43ffdd5b400857d053f872156e9b35
BLAKE2b-256 83a8242a1e4e08e9e5fb4f5efef64599f196d4d06b78873a2e5c3c46de93e029

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.1-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.2.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 811.1 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 b326426031506672b770b5d2e44545f29d5b56fb016dca5e212fbdec7a5f0d3b
MD5 afd371e1f53aa9da6986b142f46f902d
BLAKE2b-256 aaafdbb861acc20a8dfa92426852bb4fe2d9c8ff946e2efde66807f139ae8b03

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.1-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.1-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.1-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8e2190abb3a421420a3c9738900547b50c2c832c9b9ed5889eb8ebb3dfbb24ac
MD5 29cbfdc22386d52a9ea5331b89ecd9da
BLAKE2b-256 298475723f001ecb0a258855711e6086361dc2feff78dd22ac2ac4f806a9a6c6

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.1-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9296d7c241329feff4a9ed57692c53d129e1d72b0a18d4bdcb2176bc7ede4195
MD5 86131966dd2eaba54619848f49038135
BLAKE2b-256 1cac072139c06fbcbfad27ec02f33e4424b1852f121b0404f0a28dc08b7fba14

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.2.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 833.5 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 28b6ed774ba01e5a5c72530fb05afdce176d777be576351f9030a85a86aa409b
MD5 06c4e5069e380b6b204ab126a7b42000
BLAKE2b-256 a41c095dd88aac51eb84a6e6065d681a51d7110b363da07c226c80ae6ab8d162

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.1-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 6a0cda4e177be21c52dc65bed4dd2781b693275a1bd4989e6d043bba5d2e9155
MD5 36c7d3a0152d0a3a0a4a259f3b287aac
BLAKE2b-256 c8bb427acfe07fbba40d8ea45eec1af8dea05401f1ad251c034ff8de812701f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 37e9a5b53a01f269b9b213a09261656bf121d979d431ef52f4184aebf7c7efbe
MD5 13f22353651ae6fe1774eadf9872fc5c
BLAKE2b-256 2f31b2e8fe8cd25c477b7896ff70ced6e1d44b4cdd1d68907a92d3cbe33f8fc9

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page