Skip to main content

Super fast rust-powered RIM (raking) survey weighting with narwhals - supports polars and pandas

Project description

rimpy

rimpy banner

Super fast rust-powered RIM (raking) survey weighting - supports both polars and pandas via Narwhals.

PyPI License: MIT Python 3.12+ Rust

Features

  • 🚀 Fast: Rust-powered Arrow engine with zero Python objects in the data path
  • 🔄 Backend agnostic: Works with both polars and pandas DataFrames via Narwhals
  • 📦 Lightweight: Only depends on narwhals (+ pyarrow for pandas users)
  • 🎯 Simple API: One function call to weight your data
  • Inspiration: Inspired by weightipy and check out their amazing work if you have more complex weighting needs

Installation

pip install rimpy

# Or with uv
uv add rimpy

# With optional dependencies
pip install rimpy[polars]  # For polars support
pip install rimpy[all]     # For both polars and pandas

Pre-built wheels are available for Linux, Windows, and macOS (arm64) on Python 3.12–3.14. The Rust engine is included automatically — no Rust toolchain needed.

Quick Start

import polars as pl
import rimpy as rim

# Your survey data (works with pandas too!)
df = pl.DataFrame({
    "gender": [1, 1, 1, 2, 2],
    "age": [1, 2, 2, 1, 2],
})

# Define targets (percentages that should sum to 100)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 40, 2: 60},
}

# Apply weights - returns same type as input
weighted = rim.rake(df, targets)
print(weighted["weight"])

Architecture

rimpy uses a three-layer Rust design:

Python API  →  Narwhals (backend-agnostic DataFrames)
                  │
                  ▼  Arrow PyCapsule
              Binding Layer (PyO3)
                  │
                  ▼
              Arrow Middleware (language-agnostic)
                  │
                  ▼
              RIM Engine (pure Rust)

The bottom two layers have zero Python dependencies — they can be reused by R, Julia, or any language with Arrow FFI support.

How It Works

df (polars/pandas) → narwhals → Arrow → RIM engine → Arrow → narwhals → df with weights

Performance

Benchmark on synthetic survey data (polars backend), zero Python objects in the hot path:

Scenario Time
Small survey (n=1,000, 3 vars) 0.17 ms
Medium survey (n=10,000, 3 vars) 0.67 ms
Large survey (n=100,000, 3 vars) 10.60 ms
Very large survey (n=1,000,000, 3 vars) 126.14 ms
Grouped raking (n=100,000, 10 groups) 14.34 ms

Grouped raking uses Rayon to parallelize across groups.

API Reference

rake(df, targets, **options)

Apply RIM weights to a DataFrame.

weighted = rim.rake(
    df,                          # polars or pandas DataFrame
    targets,                     # dict of target proportions
    max_iterations=1000,         # max iterations before stopping
    convergence_threshold=0.01,  # convergence criterion
    min_cap=None,                # minimum weight (optional)
    max_cap=None,                # maximum weight (optional)
    weight_column="weight",      # name for weight column
    drop_nulls=True,             # handle nulls (weight=1.0)
    total=None,                  # scale weighted sum to this value (optional)
    cap_correction=True,         # small epsilon on caps to prevent boundary oscillation
)

Controlled Total Base

Scale weights so the weighted sum equals a target population size:

# 500 respondents projected to a population of 50,000
weighted = rim.rake(df, targets, total=50_000)
weighted["weight"].sum()  # ≈ 50,000

Rows excluded from raking (e.g., nulls with drop_nulls=True) keep weight=1.0 and are not scaled.

rake_with_diagnostics(df, targets, **options)

Same as rake() but also returns diagnostics.

weighted, result = rim.rake_with_diagnostics(df, targets)

print(result.converged)      # True/False
print(result.iterations)     # Number of iterations
print(result.efficiency)     # Weighting efficiency (0-100%)
print(result.weight_min)     # Minimum weight
print(result.weight_max)     # Maximum weight
print(result.weight_ratio)   # Max/min ratio
print(result.summary())      # Dict of all stats

rake_by(df, targets, by, **options)

Apply weights separately within groups (same targets for all groups).

# Weight gender/age within each country
weighted = rim.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",  # or by=["country", "region"]
)

# With controlled total across all groups
weighted = rim.rake_by(
    df,
    targets={"gender": {1: 50, 2: 50}, "age": {1: 30, 2: 40, 3: 30}},
    by="country",
    total=50_000,
)

rake_by_scheme(df, schemes, by, **options)

Apply different weighting schemes to different groups. Perfect for multi-country surveys!

# Each country can weight by DIFFERENT variables
country_schemes = {
    "US": {
        "gender": {1: 49, 2: 51},
        "age": {1: 20, 2: 30, 3: 30, 4: 20},
        "region": {1: 25, 2: 25, 3: 25, 4: 25},  # US weights by region
    },
    "UK": {
        "gender": {1: 49, 2: 51},
        "age": {1: 18, 2: 32, 3: 28, 4: 22},
        # UK doesn't weight by region or education
    },
    "DE": {
        "gender": {1: 48, 2: 52},
        "age": {1: 15, 2: 28, 3: 32, 4: 25},
        "education": {1: 30, 2: 40, 3: 30},  # Germany weights by education
    },
}

weighted = rim.rake_by_scheme(df, country_schemes, by="country")

# With diagnostics
weighted, result = rim.rake_by_scheme_with_diagnostics(df, country_schemes, by="country")
print(result.group_results["US"].efficiency)  # 90.0%
print(result.group_results["DE"].iterations)  # 15

Nested Weighting with group_totals

Weight within groups AND adjust group sizes to global targets:

# Weight age/gender within regions, then adjust region sizes
weighted = rim.rake_by_scheme(
    df,
    schemes={
        "North": {"age": {1: 15, 2: 85}, "gender": {1: 50, 2: 50}},
        "South": {"age": {1: 10, 2: 90}, "gender": {1: 48, 2: 52}},
    },
    by="region",
    group_totals={"North": 40, "South": 60},  # North=40%, South=60% of total
)

Combine with total to also control the absolute weighted base:

# Same proportions, but project to population of 10,000
weighted = rim.rake_by_scheme(
    df,
    schemes={...},
    by="region",
    group_totals={"North": 40, "South": 60},
    total=10_000,  # North≈4,000 + South≈6,000
)

The order of operations is: (1) rake within each group → (2) apply group_totals → (3) scale to total.

weight_summary(df, weight_col, by=None)

Summarize weight diagnostics, optionally by group.

# Overall summary
summary = rim.weight_summary(df, "weight")

# By country
summary = rim.weight_summary(df, "weight", by="country")

Returns DataFrame with:

Column Description
n Sample size
effective_n Effective sample size after weighting
efficiency_pct Weighting efficiency (0-100%)
weight_mean Mean weight (should be ~1.0)
weight_std Standard deviation of weights
weight_median Median weight
weight_min Minimum weight
weight_max Maximum weight
weight_ratio Ratio of max to min weight

validate_targets(df, targets)

Check targets for errors before weighting.

report = rim.validate_targets(df, targets)
print(report["errors"])    # Critical issues (will crash)
print(report["warnings"])  # Non-critical issues (informational)

validate_schemes(df, schemes, by)

Check schemes for errors before weighting with rake_by_scheme().

report = rim.validate_schemes(df, schemes, by="country")
print(report["_global"]["errors"])
print(report["US"]["warnings"])

Loading Schemes from Files

load_schemes(source, **options)

Load weighting schemes from a long-format table.

schemes = rim.load_schemes("targets.xlsx")
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

# Custom column names
schemes = rim.load_schemes(
    "targets.xlsx",
    key_col="country_id",
    var_col="variable",
    code_col="code",
    target_col="pct",
    sheet_name="Wave1",
)

Expected input format:

scheme_key target_var target_code target_pct
20230001 gender 1 49.85
20230001 gender 2 49.85
20230001 gender 3 0.3
20230001 smoker 1 21
20230001 smoker 2 79

load_schemes_wide(source, **options)

Load weighting schemes from a wide-format table.

schemes = rim.load_schemes_wide("targets.xlsx")
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

Expected input format:

target_var target_code 20230001 20240001 20230002
gender 1 49.85 49.9 49.9
gender 2 49.85 49.9 49.9
gender 3 0.3 0.2 0.2
smoker 1 21 9 10
smoker 2 79 91 90

Target Formats

rimpy accepts targets in two formats:

# Dict format (preferred)
targets = {
    "gender": {1: 49, 2: 51},
    "age": {1: 20, 2: 30, 3: 30, 4: 20},
}

# List format (weightipy-compatible)
targets = [
    {"gender": {1: 49, 2: 51}},
    {"age": {1: 20, 2: 30, 3: 30, 4: 20}},
]

Values can be proportions (0-1) or percentages (0-100). rimpy auto-detects.

Converting from weightipy

# weightipy format
weightipy_targets = {
    20230001: [
        {"gender": {1: 49.95, 2: 49.95, 3: 0.1}},
        {"age": {1: 32, 2: 37, 3: 31}},
    ],
}

# Convert to rimpy format
schemes = rim.convert_from_weightipy(weightipy_targets)
weighted = rim.rake_by_scheme(df, schemes, by="country_code")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rimpy-0.2.0.tar.gz (44.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

rimpy-0.2.0-cp314-cp314-win_amd64.whl (832.6 kB view details)

Uploaded CPython 3.14Windows x86-64

rimpy-0.2.0-cp314-cp314-manylinux_2_34_x86_64.whl (878.4 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

rimpy-0.2.0-cp314-cp314-macosx_11_0_arm64.whl (756.1 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

rimpy-0.2.0-cp313-cp313-win_amd64.whl (811.1 kB view details)

Uploaded CPython 3.13Windows x86-64

rimpy-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl (878.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

rimpy-0.2.0-cp313-cp313-macosx_11_0_arm64.whl (756.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

rimpy-0.2.0-cp312-cp312-win_amd64.whl (832.9 kB view details)

Uploaded CPython 3.12Windows x86-64

rimpy-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl (878.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

rimpy-0.2.0-cp312-cp312-macosx_11_0_arm64.whl (756.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file rimpy-0.2.0.tar.gz.

File metadata

  • Download URL: rimpy-0.2.0.tar.gz
  • Upload date:
  • Size: 44.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.0.tar.gz
Algorithm Hash digest
SHA256 af27969fc701f7efdc0d060772c20d63c5e8aa4070fb0efaeb50e4e355234b9d
MD5 ba7ca2b51cae43cdab4e2a35723df81e
BLAKE2b-256 7b60c4e79d890008ca2f251da71ffb01f850b490c90ecaf2de7c2bb4b242fbb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.0.tar.gz:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.0-cp314-cp314-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.2.0-cp314-cp314-win_amd64.whl
  • Upload date:
  • Size: 832.6 kB
  • Tags: CPython 3.14, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.0-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 2a080bdc190fe39b0b164b28c7dfa254b9dc1548878ee89ff4b631bf51f59c2b
MD5 e9bc89db1b7ef3e8793536a8130e5314
BLAKE2b-256 1e856ed1b3966a96d9557a17dacadf56a45680e6bf267c9bc88651af9619e0ed

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.0-cp314-cp314-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.0-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.0-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0d4ad6a43d180cf7583b3c8604278985da311b91ea09bb9eb64261b9d078f1a2
MD5 2372f49209d0bdb471ae4364ad28dc2d
BLAKE2b-256 0b7e2601edfecf27b6bcdb78762d6dccc07461e17ee69fea4d129f1818a02fd8

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.0-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dd999dcbdaee204f383e2439feff09384e7f5f5b77855c7a2daa82ee9644cf76
MD5 407129f8cec60eec6093917f118d9f7e
BLAKE2b-256 a664fda82541bf20fa8526824280a712e09b6ce74588c21f2536b546e38fb878

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.0-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.2.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 811.1 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 e46c7381fee52480dd8a6f959c4c367d2978d869476c8c74998a8bbe029cd99c
MD5 3fbac63b827f0b560bd30301a735a5a7
BLAKE2b-256 c930f629ae22717beb7d1914ae9062faf91e987c970b9bb567611afada4522cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.0-cp313-cp313-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b7074bebb68760ccb773287b44c53a972123ebdb101703dfba736e0c6a901bab
MD5 2de003ee784c695b0eac4594a1b95779
BLAKE2b-256 bf1162215c1b0050a300446cd212c37ed9146f0a77ef405d2b28cc085b936ec1

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2ef635762d439b318d1cf607a0cb322503fb9b015c5be2b436b7adfd6a8ea49c
MD5 e31ab6323b37887497bb5e4fd8b3a366
BLAKE2b-256 b7d68f1b2d8eab3e61098707283c67f371603230524a0f7aec28703e5c408124

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: rimpy-0.2.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 832.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rimpy-0.2.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d2c61539bf0d5e67296a62dcdd9ea412297bed71b2902ac66095e35ae8e25ae9
MD5 a0adf7d03a9920232a4cb7d2b11b05d5
BLAKE2b-256 51b1aa2bb9c04d4ec3d989fc9ca8d158e88a59e29d31dad30418a3ac5d1b8920

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.0-cp312-cp312-win_amd64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 886c03310f5f85f50e93f374aa83e1cc700e61695c9545ef40b75fc38bf4aeb5
MD5 2cd6940b455352b9d763d7be8ad0d473
BLAKE2b-256 5812fef4de8d3bdd85ed7a1a70433b94b919cf0db1c5cdeafd40ae59f4a331fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rimpy-0.2.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for rimpy-0.2.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 33cb452173cfacd9812980bc0335d1b5d8a590c73d0480085512064ab76881ce
MD5 0e83e03ce1e47f0c49ff56941c95ff08
BLAKE2b-256 fd29ee93319d3722346c525d0894dec789b350b8201484e45a4e32dd9cfe185d

See more details on using hashes here.

Provenance

The following attestation bundles were made for rimpy-0.2.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on albertxli/rimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page