Streaming survey raking via SGD and MWU
Project description
onlinerake
Real-time survey weighting for streaming data.
The Problem
You're collecting survey responses or observational data one record at a time. Your sample doesn't match population demographics—too many young respondents, too few from certain regions. Traditional weighting methods (raking/IPF) require reprocessing the entire dataset whenever a new response arrives.
onlinerake updates weights incrementally as each observation streams in, keeping weighted margins aligned with population targets in real time.
When to Use This
- Online surveys where responses arrive continuously
- A/B tests that need demographic balance during collection
- Passive data collection (app usage, sensor data) requiring real-time calibration
- Any streaming scenario where batch reweighting is too slow or impractical
Quick Start
pip install onlinerake
from onlinerake import OnlineRakingSGD, Targets
# Define population targets (proportion with indicator = 1)
targets = Targets(
female=0.51, # 51% female in population
college=0.32, # 32% college educated
age_65_plus=0.17 # 17% age 65+
)
# Create raker
raker = OnlineRakingSGD(targets, learning_rate=5.0)
# Process observations as they arrive
for response in survey_stream:
raker.partial_fit(response)
# Check current state anytime
print(f"Weighted margins: {raker.margins}")
print(f"Effective sample size: {raker.effective_sample_size:.0f}")
# Get final weights
weights = raker.weights[:raker.n_obs]
Which Algorithm?
| Use Case | Algorithm | Learning Rate |
|---|---|---|
| Most cases | OnlineRakingSGD |
5.0 |
| Smoother weights, higher ESS | OnlineRakingSGD |
2.0-5.0 |
| IPF-like multiplicative updates | OnlineRakingMWU |
0.5-1.0 |
| Starting from unequal base weights | OnlineRakingMWU |
0.5-1.0 |
Recommendation: Start with OnlineRakingSGD(targets, learning_rate=5.0). It converges faster, maintains higher effective sample size, and handles most scenarios well.
Performance
In simulation studies across linear drift, sudden shift, and oscillating bias scenarios:
| Method | Margin Error Reduction | Effective Sample Size |
|---|---|---|
| SGD | 72-80% | 225-280 (of 300) |
| MWU | 47-52% | 175-276 (of 300) |
| Unweighted | baseline | 300 |
SGD consistently outperforms MWU on margin accuracy while maintaining comparable effective sample sizes.
Features
Continuous Covariates (v1.3.0)
Target means instead of proportions:
targets = Targets(
age=(42.0, "mean"), # Target mean age = 42
income=(55000, "mean"), # Target mean income = $55,000
female=0.51 # Binary: 51% female
)
Learning Rate Schedules
For theoretical convergence guarantees:
from onlinerake import OnlineRakingSGD, Targets, PolynomialDecayLR
from onlinerake.convergence import verify_robbins_monro
schedule = PolynomialDecayLR(initial_lr=10.0, power=0.6)
raker = OnlineRakingSGD(targets, learning_rate=schedule)
# Verify Robbins-Monro conditions (analytical for known schedules)
result = verify_robbins_monro(schedule)
print(result.condition_1_satisfied) # True: Σ η_t = ∞
print(result.condition_2_satisfied) # True: Σ η_t² < ∞
The verify_robbins_monro() function provides analytical verification for known schedule types with mathematical proofs.
Diagnostics
from onlinerake import check_target_feasibility, compute_design_effect
# Check if targets are achievable with your data
feasibility = check_target_feasibility(raker)
print(f"Feasible: {feasibility.is_feasible}")
# Measure weighting efficiency
deff = compute_design_effect(raker)
print(f"Design effect: {deff:.2f}")
Batch Comparison
Compare streaming results against traditional IPF:
from onlinerake import BatchIPF
batch_raker = BatchIPF(targets)
batch_raker.fit(all_observations)
print(f"Online loss: {online_raker.loss:.6f}")
print(f"Batch loss: {batch_raker.loss:.6f}")
API Reference
Core Classes
Targets(**features) - Define population margins
- Binary features:
female=0.51(proportion = 1) - Continuous features:
age=(42.0, "mean")(target mean)
OnlineRakingSGD(targets, learning_rate=5.0) - SGD-based streaming raker
.partial_fit(obs)- Process one observation.margins- Current weighted margins (dict).loss- Current squared-error loss.weights- Weight array (use[:raker.n_obs]to slice).effective_sample_size- ESS accounting for weight variation.converged- Whether loss is below tolerance
OnlineRakingMWU(targets, learning_rate=1.0) - Multiplicative weights raker
- Same API as
OnlineRakingSGD
Key Parameters
| Parameter | Default | Description |
|---|---|---|
learning_rate |
5.0 (SGD), 1.0 (MWU) | Step size for updates |
min_weight |
0.1 | Minimum allowed weight |
max_weight |
10.0 | Maximum allowed weight |
n_steps |
3 | Gradient steps per observation |
convergence_tol |
1e-6 | Loss threshold for convergence |
Installation
pip install onlinerake
Development install:
git clone https://github.com/finite-sample/onlinerake.git
cd onlinerake
pip install -e ".[docs]"
Testing
pytest tests/ -v
Examples
See examples/ for complete worked examples:
real_survey_example.py- Basic survey weightingab_test_calibration.py- Balancing treatment/control groupsad_targeting_calibration.py- Real-time ad delivery calibrationrecommendation_balancing.py- Content recommendation fairness
Interactive notebooks in docs/notebooks/:
01_getting_started.ipynb- Visual introduction02_performance_comparison.ipynb- Algorithm benchmarking03_advanced_diagnostics.ipynb- Convergence and diagnostics
Citation
If you use this package in research, please cite:
@software{onlinerake,
author = {Sood, Gaurav},
title = {onlinerake: Streaming Survey Raking},
url = {https://github.com/finite-sample/onlinerake},
version = {1.3.0},
year = {2026}
}
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file onlinerake-1.3.0.tar.gz.
File metadata
- Download URL: onlinerake-1.3.0.tar.gz
- Upload date:
- Size: 41.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8230c844f77f755433b70269e78c3b02af33bd685a6d2f963e5bf3cceb36bc53
|
|
| MD5 |
42e8f53045b7c3d18877bab3d11a3b36
|
|
| BLAKE2b-256 |
eae57ea5e31d3d7b59543747c7f4932df1ec0f4472bb56124af1f1ecbdbdb7fe
|
Provenance
The following attestation bundles were made for onlinerake-1.3.0.tar.gz:
Publisher:
python-publish.yml on finite-sample/onlinerake
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
onlinerake-1.3.0.tar.gz -
Subject digest:
8230c844f77f755433b70269e78c3b02af33bd685a6d2f963e5bf3cceb36bc53 - Sigstore transparency entry: 1187635878
- Sigstore integration time:
-
Permalink:
finite-sample/onlinerake@25a0f5476d36ac182f275a7196c4a5ee9eef564b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/finite-sample
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@25a0f5476d36ac182f275a7196c4a5ee9eef564b -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file onlinerake-1.3.0-py3-none-any.whl.
File metadata
- Download URL: onlinerake-1.3.0-py3-none-any.whl
- Upload date:
- Size: 47.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2488ed5fafd6a306b0e3db3706480ca8708ffd9d0f13eaee7af82c47fb4a4eac
|
|
| MD5 |
966ebc0bc212d16d5b0a641a9a987d83
|
|
| BLAKE2b-256 |
90dfbd91c6dc545ed03cf6e2c063946effb7ad8f02d266ec2ab44b160a27d95b
|
Provenance
The following attestation bundles were made for onlinerake-1.3.0-py3-none-any.whl:
Publisher:
python-publish.yml on finite-sample/onlinerake
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
onlinerake-1.3.0-py3-none-any.whl -
Subject digest:
2488ed5fafd6a306b0e3db3706480ca8708ffd9d0f13eaee7af82c47fb4a4eac - Sigstore transparency entry: 1187635891
- Sigstore integration time:
-
Permalink:
finite-sample/onlinerake@25a0f5476d36ac182f275a7196c4a5ee9eef564b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/finite-sample
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@25a0f5476d36ac182f275a7196c4a5ee9eef564b -
Trigger Event:
workflow_dispatch
-
Statement type: