High-performance Pythonic backtesting engine with Apache Parquet storage
Project description
Zipline Refresh
A high-performance Pythonic backtesting engine for algorithmic trading strategies
Zipline is a Pythonic event-driven system for backtesting, originally developed by Quantopian. This Refresh fork modernizes the storage layer, eliminates legacy dependencies, and delivers significant performance improvements.
Documentation · Website · Report Bug
What's New in Refresh
Storage: bcolz → Apache Parquet
The legacy bcolz storage layer has been fully replaced with Apache Parquet via PyArrow:
| bcolz (legacy) | Parquet (new) | |
|---|---|---|
| Format | Custom binary + Cython | Standard columnar, zstd compressed |
| Daily bars | One ctable per field | Single .parquet file per bundle |
| Minute bars | Fixed-stride padding for early closes | Actual trading minutes only |
| Dependencies | bcolz (unmaintained, build issues) | pyarrow (actively maintained) |
| Data types | uint32 (lossy for prices) | float64 (full precision) |
Performance Optimizations
Profiling-driven optimizations on the backtest hot path (50 assets, 780 bars/session):
| Optimization | Speedup | Detail |
|---|---|---|
| Lazy per-field loading | 3.2x single field | Load only requested OHLCV fields instead of all 5 |
| NumPy int64 searchsorted | 40x per lookup | Replace DatetimeIndex.get_loc() with np.searchsorted on int64 arrays |
| Vectorized last-traded | 17x | np.flatnonzero instead of Python loop for get_last_traded_dt |
| Batch resample aggregation | 5x lifetimes | Vectorized _lifetimes_map and batch load_raw_arrays in DailyHistoryAggregator |
Net result: pandas DatetimeIndex overhead reduced from 46% to 6.5% of hot-path time. Overall backtest data layer is ~2x faster.
Benchmark: Time breakdown (50 assets x 780 bars)
Before optimization:
pandas DatetimeIndex 46.0% ██████████████████████████████████████████████
get_value (reader) 13.0% █████████████
memoize/lazyval 10.0% ██████████
After optimization:
pandas DatetimeIndex 6.5% ██████
get_value (reader) 26.7% ██████████████████████████
memoize/lazyval 12.9% ████████████
numpy operations 12.2% ████████████
Total hot-path time: 0.44s → 0.24s (1.8x faster)
Per-bar latency: 0.6ms → 0.3ms
Features
- Event-Driven Architecture — Realistic simulation with proper order lifecycle, slippage, and commission models
- Pipeline API — Factor-based screening with 20+ built-in technical factors (RSI, MACD, Bollinger, Ichimoku, etc.) and easy
CustomFactorextensibility - Factor Composition —
rank(),zscore(),demean(),winsorize(),top(N)withgroupbyfor sector-neutral strategies - PyData Integration — pandas DataFrames in/out, compatible with matplotlib, scipy, statsmodels, scikit-learn
- Multi-Country Support — 42 country domains with proper trading calendars via
exchange_calendars - Minute & Daily Resolution — Full minute-level backtesting with proper market open/close handling
Installation
Zipline supports Python >= 3.10 and is compatible with current versions of NumFOCUS libraries.
Using pip
pip install zipline-refresh
From source
git clone https://github.com/teleclaws/zipline-refresh.git
cd zipline-refresh
pip install -e .
See the documentation for detailed instructions.
Quickstart
Example 1: RSI Long/Short Pipeline Strategy
Use the Pipeline API to rank stocks by RSI and build a long/short portfolio — rebalanced daily:
from zipline.api import attach_pipeline, order_target_percent, pipeline_output, schedule_function
from zipline.finance import commission, slippage
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import RSI
def make_pipeline():
rsi = RSI()
return Pipeline(
columns={"longs": rsi.top(3), "shorts": rsi.bottom(3)},
)
def initialize(context):
attach_pipeline(make_pipeline(), "my_pipeline")
schedule_function(rebalance)
context.set_commission(commission.PerShare(cost=0.001, min_trade_cost=1.0))
context.set_slippage(slippage.VolumeShareSlippage())
def before_trading_start(context, data):
context.pipeline_data = pipeline_output("my_pipeline")
def rebalance(context, data):
pipeline_data = context.pipeline_data
longs = pipeline_data.index[pipeline_data.longs]
shorts = pipeline_data.index[pipeline_data.shorts]
for asset in longs:
order_target_percent(asset, 1.0 / 3.0)
for asset in shorts:
order_target_percent(asset, -1.0 / 3.0)
for asset in context.portfolio.positions:
if asset not in longs and asset not in shorts and data.can_trade(asset):
order_target_percent(asset, 0)
Example 2: Multi-Factor Ranking
Combine multiple factors with ranking and normalization:
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume, Returns, RSI
def make_pipeline():
# Factor definitions
momentum = Returns(window_length=20).rank()
mean_reversion = -Returns(window_length=5).rank()
rsi_signal = RSI().rank()
# Composite score (equal-weighted)
composite = (momentum + mean_reversion + rsi_signal).rank()
# Liquidity filter
liquid = AverageDollarVolume(window_length=30).top(100)
return Pipeline(
columns={
"score": composite,
"longs": composite.top(10, mask=liquid),
"shorts": composite.bottom(10, mask=liquid),
},
screen=liquid,
)
Data Ingestion
Zipline supports CSV-based data bundles for any market:
# In ~/.zipline/extension.py
from zipline.data.bundles import register
from zipline.data.bundles.csvdir import csvdir_equities
register(
"my-data",
csvdir_equities(["daily"], "/path/to/csv/dir"),
calendar_name="XNYS",
)
# Ingest and run
zipline ingest -b my-data
zipline run -f strategy.py --start 2020-1-1 --end 2024-1-1 -o results.pickle --no-benchmark -b my-data
More examples in the examples directory.
Compatibility Notes
Release 3.05 — Compatible with NumPy 2.0 (requires pandas >= 2.2.2)
Release 3.0 — Updated to pandas >= 2.0 and SQLAlchemy > 2.0
Release 2.4 — Updated to exchange_calendars >= 4.2
Contributing
This project is sponsored by Kavout.
Found a bug or have a suggestion? Open an issue.
License
Apache 2.0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zipline_refresh-0.1.dev6708.tar.gz.
File metadata
- Download URL: zipline_refresh-0.1.dev6708.tar.gz
- Upload date:
- Size: 13.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d6c3cf1830a187c6852f2b841b48b9476dc909911498e88bee759e0e2a620f3
|
|
| MD5 |
b506eb0320da1591dd03a30afc2664fb
|
|
| BLAKE2b-256 |
2cd565b476cea98c3c3dceac90fc6239011e7b7f6195d1b7151200491623db6b
|
File details
Details for the file zipline_refresh-0.1.dev6708-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: zipline_refresh-0.1.dev6708-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 5.0 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62177b72ec84258f2e3193babecb675d16f3c885f1bb525e12fb2f316c3dde3c
|
|
| MD5 |
9b1e88b07957bb61758d9bd5ccf13ebf
|
|
| BLAKE2b-256 |
543fd5fd458d42bff9fe4752f48aca0f13b0f13628196ebfb2f74b048ff23be3
|