Transparent pandas performance optimization via Numba-accelerated parallel operations
Project description
unlockedpd
Unlock pandas performance with zero code changes.
unlockedpd is a drop-in performance booster for pandas that achieves 5-15x speedups on rolling, expanding, EWM, and cumulative operations. Just import unlockedpd after pandas and your existing code runs faster.
import pandas as pd
import unlockedpd # That's it. Your pandas code is now faster.
df = pd.DataFrame(...)
df.rolling(20).mean() # 5x faster!
df.expanding().max() # 15x faster!
df.ewm(span=20).mean() # 4.8x faster!
Why unlockedpd?
| Library | Speedup | pandas Compatible | Setup Required |
|---|---|---|---|
| unlockedpd | 8.7x avg | 100% | pip install |
| Polars | 5-10x | 0% (new API) | Learn new API |
| Modin | ~4x | 95% | Ray/Dask cluster |
Key advantages:
- Zero code changes: Works with your existing pandas code
- No infrastructure: No Ray, no Dask, no distributed setup
- No new API to learn: It's still pandas
- Automatic fallback: Falls back to pandas for unsupported cases
Benchmarks
Tested on a 64-core machine with a 0.8GB DataFrame (10,000 rows x 10,000 columns):
Rolling Operations (8.4x average)
| Operation | pandas | unlockedpd | Speedup |
|---|---|---|---|
rolling(20).mean() |
1.96s | 0.39s | 5.0x |
rolling(20).sum() |
1.78s | 0.18s | 9.7x |
rolling(20).std() |
2.51s | 0.40s | 6.3x |
rolling(20).var() |
2.36s | 0.40s | 5.9x |
rolling(20).min() |
3.30s | 0.28s | 11.6x |
rolling(20).max() |
3.36s | 0.29s | 11.6x |
Expanding Operations (10.7x average)
| Operation | pandas | unlockedpd | Speedup |
|---|---|---|---|
expanding().mean() |
1.55s | 0.20s | 7.9x |
expanding().sum() |
1.46s | 0.18s | 8.3x |
expanding().std() |
1.89s | 0.20s | 9.6x |
expanding().var() |
1.65s | 0.18s | 9.1x |
expanding().min() |
2.61s | 0.18s | 14.3x |
expanding().max() |
2.69s | 0.18s | 15.1x |
EWM Operations (5.3x average)
| Operation | pandas | unlockedpd | Speedup |
|---|---|---|---|
ewm(span=20).mean() |
1.18s | 0.25s | 4.8x |
ewm(span=20).std() |
1.51s | 0.37s | 4.0x |
ewm(span=20).var() |
1.31s | 0.19s | 7.1x |
Cumulative Operations (3.2x average)
| Operation | pandas | unlockedpd | Speedup |
|---|---|---|---|
cumsum() |
0.59s | 0.19s | 3.2x |
cummin() |
0.58s | 0.18s | 3.2x |
cummax() |
0.58s | 0.19s | 3.1x |
Other Operations
| Operation | Speedup |
|---|---|
pct_change() |
11x |
rank(axis=1) |
8-10x |
rank(axis=0) |
1.4-1.5x |
diff() |
1.0-1.7x |
shift() |
1.0-1.5x |
Installation
pip install unlockedpd
Requirements:
- Python 3.9+
- pandas >= 1.5
- numba >= 0.56
- numpy >= 1.21
Usage
Basic Usage
import pandas as pd
import unlockedpd # Import after pandas
# Your existing code works unchanged
df = pd.DataFrame(np.random.randn(10000, 1000))
result = df.rolling(20).mean() # Automatically optimized!
Configuration
import unlockedpd
# Disable optimizations temporarily
unlockedpd.config.enabled = False
# Set thread count (default: min(cpu_count, 32))
unlockedpd.config.num_threads = 16
# Enable warnings when falling back to pandas
unlockedpd.config.warn_on_fallback = True
# Set minimum elements for parallel execution
unlockedpd.config.parallel_threshold = 500_000
Environment Variables
export UNLOCKEDPD_ENABLED=false
export UNLOCKEDPD_NUM_THREADS=16
export UNLOCKEDPD_WARN_ON_FALLBACK=true
export UNLOCKEDPD_PARALLEL_THRESHOLD=500000
Temporarily Disable
from unlockedpd import _PatchRegistry
with _PatchRegistry.temporarily_unpatched():
# Uses original pandas here
result = df.rolling(20).mean()
How It Works
unlockedpd achieves its speedups through:
- Numba JIT compilation: Operations are compiled to optimized machine code
nogil=True: Releases Python's GIL during computation- ThreadPoolExecutor: Achieves true parallelism across CPU cores
- Column-wise chunking: Distributes work efficiently across threads
The key insight: @njit(nogil=True) + ThreadPoolExecutor combines Numba's fast compiled loops with true multi-threaded parallelism.
┌─────────────────────────────────────────────────────────────┐
│ ThreadPoolExecutor │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Thread 1│ │ Thread 2│ │ Thread 3│ ... │Thread 32│ │
│ │ Cols 0-k│ │Cols k-2k│ │Cols 2k..│ │Cols ..N │ │
│ │ (nogil) │ │ (nogil) │ │ (nogil) │ │ (nogil) │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────┘
What's Optimized
Fully optimized (5-15x faster):
rolling().mean(),sum(),std(),var(),min(),max(),count(),skew(),kurt(),median(),quantile()expanding().mean(),sum(),std(),var(),min(),max(),count(),skew(),kurt()ewm().mean(),std(),var()cumsum(),cumprod(),cummin(),cummax()rank()(both axis=0 and axis=1)pct_change(),diff(),shift()rolling().corr(),rolling().cov()(pairwise)
Passes through to pandas (unchanged):
rolling().apply()(custom functions)- Series operations (optimizations target DataFrames)
- Non-numeric columns (auto-fallback)
Compatibility
unlockedpd is designed for 100% pandas compatibility:
- Drop-in replacement: No code changes required
- Automatic fallback: If optimization fails, falls back to pandas
- Type preservation: Returns same types as pandas
- Index preservation: Maintains DataFrame/Series indices
- NaN handling: Correctly handles missing values
Comparison with Alternatives
vs Polars
| Aspect | unlockedpd | Polars |
|---|---|---|
| Speedup | 8.7x avg | 5-10x |
| API | pandas (unchanged) | New API to learn |
| Code changes | None | Rewrite required |
| Ecosystem | pandas ecosystem | Polars ecosystem |
vs Modin
| Aspect | unlockedpd | Modin |
|---|---|---|
| Speedup | 8.7x avg | ~4x (general) |
| Rolling ops | 8.4x optimized | Not optimized |
| Infrastructure | None | Ray/Dask cluster |
| Memory | Low overhead | Partitioning overhead |
vs Vanilla Numba
| Aspect | unlockedpd | Manual Numba |
|---|---|---|
| Usage | import unlockedpd |
Write custom kernels |
| GIL handling | Automatic (nogil=True) |
Manual |
| Parallelization | Automatic ThreadPool | Manual implementation |
Running Benchmarks
# Clone the repo
git clone https://github.com/Yeachan-Heo/unlockedpd
cd unlockedpd
# Install with dev dependencies
pip install -e ".[dev]"
# Run benchmarks
pytest benchmarks/ -v
Contributing
Contributions are welcome! Areas of interest:
- Additional operation optimizations
- Performance improvements
- Documentation and examples
- Bug reports and fixes
Changelog
v0.2.2 (2026-01-21)
100% Pandas Compatibility Fixes:
- Fixed
pct_change()division by zero - Now correctly returnsinf/-infwhen dividing by zero (previously returnedNaN) - Fixed rolling
skew()andkurt()- Corrected bias correction formulas to match pandas exactly - Fixed expanding
skew()andkurt()- Corrected bias correction formulas to match pandas exactly - Fixed zero variance handling - Rolling/expanding skew returns
0.0, kurt returns-3.0for constant data (matching pandas) - Fixed EWM
mean()formula - Corrected weight decay calculation to match pandas exactly - Fixed EWM
std()andvar()- Corrected bias correction formula and first value handling (returns NaN for first observation)
All operations now pass strict pandas compatibility tests (rtol=1e-10) for edge cases including:
- All-NaN columns
- Zero variance (constant data)
- Near-zero variance (numerical precision edge cases)
- Division by zero in pct_change
v0.2.1 (2026-01-20)
Critical Bug Fix:
- Fixed
pct_change()NaN handling to match pandas default behavior- Previous versions treated
fill_method=Noneas default, causing 5x more NaN values - Now correctly defaults to
fill_method='pad'(forward fill before computing), matching pandas - This fix resolves "Weights are all zero" errors in downstream applications using unlockedpd
- Previous versions treated
API:
pct_change(fill_method='pad')- Default, matches pandas behavior (forward fills NaN before computing)pct_change(fill_method=None)- No fill, fastest option (4.8x vs pandas), use when data has no NaN
v0.2.0 (2026-01-20)
- Major performance improvements across all operations
- Added EWM, expanding, cumulative, and pairwise operations
- Improved parallel dispatch and memory layout optimization
v0.1.0 (2026-01-19)
- Initial release with rolling, rank, and transform operations
License
MIT License - see LICENSE for details.
Acknowledgments
Built with:
How This Project Was Built
This entire project was built using oh-my-claude-sisyphus, an advanced Claude Code harness that enables autonomous, iterative development with specialized AI agents. The codebase, benchmarks, documentation, and optimizations were all generated through the sisyphus workflow orchestration system.
Key oh-my-claude-sisyphus features used:
- Ralph-Plan: Iterative planning with Prometheus (planner), Oracle (advisor), and Momus (reviewer) agents
- Ultrawork Mode: Parallel agent execution for maximum throughput
- Sisyphus-Junior: Focused task execution for implementation work
unlockedpd - Because your pandas code deserves to be fast.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file unlockedpd-0.2.3.tar.gz.
File metadata
- Download URL: unlockedpd-0.2.3.tar.gz
- Upload date:
- Size: 112.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce6d51f19fe9e4a0687f329adc6696551c37002b94ca3c7a3a1597b0f652afaa
|
|
| MD5 |
878aad4cb0fab53a6cb84fbfeb1d3311
|
|
| BLAKE2b-256 |
723320bf640dd18c1a816257c2408f2759a68c2d83ed6dd240393c94ec444ebe
|
File details
Details for the file unlockedpd-0.2.3-py3-none-any.whl.
File metadata
- Download URL: unlockedpd-0.2.3-py3-none-any.whl
- Upload date:
- Size: 46.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cc8d4895a143205a0544342093bc3d62ab594de7ed93f97880707d786d7f737
|
|
| MD5 |
f527f16511fb2eb59b0a2b4dad85184e
|
|
| BLAKE2b-256 |
2433537c2d20302ccaad6b8f7678636c4388741cacf1a804ac79abddaa428104
|