fewlab

Pick the fewest items to label for unbiased OLS on shares

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

soodoku

These details have not been verified by PyPI

Project description

fewlab: fewest items to label for most efficient unbiased OLS on shares

Problem: You have usage data (users × items) and want to understand how user traits relate to item preferences. But you can't afford to label every item. This tool tells you which items to label first to get the most accurate analysis.

When You Need This

You have:

A usage matrix: rows are users, columns are items (websites, products, apps)
User features you want to analyze (demographics, behavior patterns)
Limited budget to label items (safe/unsafe, brand affiliation, category)

You want to run a regression to understand relationships between user features and item traits, but labeling is expensive. Random sampling wastes budget on items that don't affect your analysis.

How It Works

The tool identifies items that most influence your regression coefficients. It prioritizes items that:

Are used by many people
Show different usage patterns across your user segments
Would most change your conclusions if mislabeled

Think of it as "statistical leverage"—some items matter more for understanding user-trait relationships.

Quick Start

from fewlab import Design
import pandas as pd

# Your data: user features and item usage
user_features = pd.DataFrame(...)  # User characteristics
item_usage = pd.DataFrame(...)     # Usage counts per user-item

# Create design (caches expensive computations)
design = Design(item_usage, user_features)

# Get top 100 items to label
priority_items = design.select(budget=100)

# Send priority_items to your labeling team
print(f"Label these items first: {priority_items}")

Advanced Usage

from fewlab import Design
import pandas as pd

# Create design with automatic ridge detection
design = Design(item_usage, user_features, ridge="auto")

# Multiple selection strategies
deterministic_items = design.select(budget=100, method="deterministic")
greedy_items = design.select(budget=100, method="greedy")

# Probabilistic sampling methods
balanced_sample = design.sample(budget=100, method="balanced", seed=42)
hybrid_sample = design.sample(budget=100, method="core_plus_tail", tail_frac=0.3)
adaptive_sample = design.sample(budget=100, method="adaptive")

# Get inclusion probabilities
pi_aopt = design.inclusion_probabilities(budget=100, method="aopt")
pi_rowse = design.inclusion_probabilities(budget=100, method="row_se", eps2=0.01)

# Complete workflow: select, weight, estimate
selected = design.select(budget=50)
labels = pd.Series([1, 0, 1, ...], index=selected)  # Your labels
weights = design.calibrate_weights(selected)
estimates = design.estimate(selected, labels)

# Access diagnostics
print(f"Condition number: {design.diagnostics['condition_number']:.2e}")
print(f"Influence weights: {design.influence_weights.head()}")

What You Get

Primary Interface:

Design: Object-oriented API that caches expensive computations and provides unified access to all methods

Selection Methods:

.select(method="deterministic"): Batch A-optimal top-budget items (fastest)
.select(method="greedy"): Sequential greedy A-optimal selection
.sample(method="balanced"): Balanced probabilistic sampling
.sample(method="core_plus_tail"): Hybrid deterministic + probabilistic
.sample(method="adaptive"): Data-driven hybrid with automatic parameters

Probability Methods:

.inclusion_probabilities(method="aopt"): A-optimal square-root rule
.inclusion_probabilities(method="row_se"): Row-wise standard error minimization

Complete Workflow:

.calibrate_weights(): GREG-style weight calibration
.estimate(): Calibrated Horvitz-Thompson estimation
.diagnostics: Comprehensive design diagnostics

All methods leverage cached influence computations for efficiency and provide consistent, structured results.

Practical Considerations

Choosing budget: Start with 10-20% of items. You can always label more if needed.

Validation: Compare regression stability with different budget values. When coefficients stop changing significantly, you have enough labels.

Performance: The Design class caches expensive influence computations, making multiple method calls efficient.

Limitations:

Works best when usage patterns correlate with user features
Assumes item labels are binary (has trait / doesn't have trait)
Most effective for sparse usage matrices

Advanced: Ensuring Unbiased Estimates

The basic approach gives you optimal items to label but technically requires some randomization for completely unbiased statistical estimates. If you need formal statistical guarantees, add a small random sample on top of the priority list. See the statistical details for more.

Installation

pip install fewlab

Requirements: Python 3.12-3.14, numpy ≥1.23, pandas ≥1.5

Development:

pip install -e ".[dev]"  # Includes testing, linting, pre-commit hooks
pip install -e ".[docs]" # Includes documentation building

What's New in v1.0.0

🎯 Object-Oriented API: New Design class caches expensive computations and provides unified interface
🚀 Performance: Eliminate redundant influence computations across multiple method calls
📊 Structured Results: Typed result classes replace loose tuples for better API consistency
🔧 Standardized Parameters: All functions use budget parameter (was K), no backward compatibility
📈 Comprehensive Diagnostics: Automatic condition number monitoring and ridge selection
🧪 Enhanced Testing: Full test coverage for new Design class and edge cases
🐍 Modern Python: Requires Python 3.12-3.14, uses latest type annotations
🛡️ Robust Validation: Enhanced input validation with helpful error messages

Development

To contribute to this project, install dependencies and set up pre-commit hooks:

uv sync --all-groups
uv run pre-commit install

License

MIT

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

soodoku

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.1

Dec 27, 2025

1.1.0

Dec 27, 2025

0.3.1

Nov 1, 2025

0.3.0

Nov 1, 2025

0.2.0

Nov 1, 2025

0.1.0

Sep 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fewlab-1.1.1.tar.gz (27.3 kB view details)

Uploaded Dec 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fewlab-1.1.1-py3-none-any.whl (34.6 kB view details)

Uploaded Dec 27, 2025 Python 3

File details

Details for the file fewlab-1.1.1.tar.gz.

File metadata

Download URL: fewlab-1.1.1.tar.gz
Upload date: Dec 27, 2025
Size: 27.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fewlab-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`6012ae9f61c18fff55c3341c83fcfb9f8070f40d7cc8f6e1bf3073dc32953115`
MD5	`f9d89121a2487ca84089db25136dfaaf`
BLAKE2b-256	`20e6fab30953f9e1e3a96bfc85636e8be504fb86d9b4202aaf2631bc184488ad`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fewlab-1.1.1.tar.gz:

Publisher: python-publish.yml on finite-sample/fewlab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fewlab-1.1.1.tar.gz
- Subject digest: 6012ae9f61c18fff55c3341c83fcfb9f8070f40d7cc8f6e1bf3073dc32953115
- Sigstore transparency entry: 780178295
- Sigstore integration time: Dec 27, 2025
Source repository:
- Permalink: finite-sample/fewlab@aa3cd842394c105a6d575e2b45c639868df8fa1b
- Branch / Tag: refs/heads/main
- Owner: https://github.com/finite-sample
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@aa3cd842394c105a6d575e2b45c639868df8fa1b
- Trigger Event: workflow_dispatch

File details

Details for the file fewlab-1.1.1-py3-none-any.whl.

File metadata

Download URL: fewlab-1.1.1-py3-none-any.whl
Upload date: Dec 27, 2025
Size: 34.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fewlab-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e421bf29b23f8f302d9a89f0a4ecba40e953c8d38a10923c49cbbc77cebc99e3`
MD5	`ffc43ffaf81c360b686b48a80d878806`
BLAKE2b-256	`3c8a9c45d4e240e65bad619ed8e4426f707b43142a62e941c98a12b76dcf6f86`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fewlab-1.1.1-py3-none-any.whl:

Publisher: python-publish.yml on finite-sample/fewlab

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fewlab-1.1.1-py3-none-any.whl
- Subject digest: e421bf29b23f8f302d9a89f0a4ecba40e953c8d38a10923c49cbbc77cebc99e3
- Sigstore transparency entry: 780178298
- Sigstore integration time: Dec 27, 2025
Source repository:
- Permalink: finite-sample/fewlab@aa3cd842394c105a6d575e2b45c639868df8fa1b
- Branch / Tag: refs/heads/main
- Owner: https://github.com/finite-sample
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@aa3cd842394c105a6d575e2b45c639868df8fa1b
- Trigger Event: workflow_dispatch

fewlab 1.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

fewlab: fewest items to label for most efficient unbiased OLS on shares

When You Need This

How It Works

Quick Start

Advanced Usage

What You Get

Practical Considerations

Advanced: Ensuring Unbiased Estimates

Installation

What's New in v1.0.0

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance