Skip to main content

High-performance Pythonic backtesting engine with Apache Parquet storage

Project description

Zipline Refresh

A high-performance Pythonic backtesting engine for algorithmic trading strategies

Python 3.10+ PyPI Tests License


Zipline is a Pythonic event-driven system for backtesting, originally developed by Quantopian. This Refresh fork modernizes the storage layer, eliminates legacy dependencies, and delivers significant performance improvements.

Documentation  ·  Website  ·  Report Bug


What's New in Refresh

Phase 1: bcolz → Apache Parquet

The legacy bcolz storage layer has been fully replaced with Apache Parquet via PyArrow:

bcolz (legacy) Parquet (new)
Format Custom binary + Cython Standard columnar, zstd compressed
Daily bars One ctable per field Single .parquet file per bundle
Minute bars Fixed-stride padding + Cython position math Actual trading minutes only — no padding
Dependencies bcolz (unmaintained, build failures on Python 3.12+) pyarrow (actively maintained)
Data types uint32 (lossy for prices > $42,949) float64 (full precision)
Interoperability Proprietary format Standard Parquet — readable by pandas, Spark, DuckDB
Compression None / blosc zstd (2-5x smaller on disk)
Early close handling Complex Cython exclusion logic Eliminated — only real trading minutes stored

Phase 2: Profiling-Driven Hot Path Optimization

Systematic profiling (50 assets, 780 bars/session) identified and eliminated bottlenecks across the entire data layer:

Optimization Speedup Detail
bcolz → Parquet migration N/A Eliminated unmaintained dependency, Cython position math, uint32 truncation
Lazy per-field loading 3.2x single field Load only requested OHLCV fields instead of all 5 at once
Vectorized lifetimes 5x Replace per-sid Python loop with single pd.DataFrame construction
Batch resample aggregation 5x Batch load_raw_arrays in DailyHistoryAggregator instead of per-field calls
NumPy int64 searchsorted 40x per lookup Replace DatetimeIndex.get_loc() (~4.3µs) with np.searchsorted on int64 (~0.1µs)
Vectorized last-traded 17x np.flatnonzero on volume array instead of Python backward scan

Net result: pandas DatetimeIndex overhead reduced from 46% → 6.5% of hot-path time. Per-bar latency 0.6ms → 0.3ms.

Benchmark details (50 assets x 780 bars)
Before (bcolz baseline → initial Parquet):
  pandas DatetimeIndex             46.0%  ██████████████████████████████████████████████
  get_value (reader)               13.0%  █████████████
  memoize/lazyval                  10.0%  ██████████

After (fully optimized Parquet):
  pandas DatetimeIndex              6.5%  ██████
  get_value (reader)               26.7%  ██████████████████████████
  memoize/lazyval                  12.9%  ████████████
  numpy operations                 12.2%  ████████████

Total hot-path time: 0.44s → 0.24s (1.8x faster)
Per-bar latency: 0.6ms → 0.3ms

Micro-benchmarks (500 sids x 1000 days):

  • Single field load: 65.5ms → 20.6ms (3.2x)
  • get_last_traded_dt: 3.4ms → 0.2ms (17x)
  • _lifetimes_map: 5.5ms → 1.1ms (5x)
  • Sequential get_value: 68.5ms → 23.1ms (3.0x)

Features

  • Event-Driven Architecture — Realistic simulation with proper order lifecycle, slippage, and commission models
  • Pipeline API — Factor-based screening with 20+ built-in technical factors (RSI, MACD, Bollinger, Ichimoku, etc.) and easy CustomFactor extensibility
  • Factor Compositionrank(), zscore(), demean(), winsorize(), top(N) with groupby for sector-neutral strategies
  • PyData Integration — pandas DataFrames in/out, compatible with matplotlib, scipy, statsmodels, scikit-learn
  • Multi-Country Support — 42 country domains with proper trading calendars via exchange_calendars
  • Minute & Daily Resolution — Full minute-level backtesting with proper market open/close handling

Installation

Zipline supports Python >= 3.10 and is compatible with current versions of NumFOCUS libraries.

Using pip

pip install zipline-refresh

From source

git clone https://github.com/teleclaws/zipline-refresh.git
cd zipline-refresh
pip install -e .

See the documentation for detailed instructions.

Quickstart

Example 1: RSI Long/Short Pipeline Strategy

Use the Pipeline API to rank stocks by RSI and build a long/short portfolio — rebalanced daily:

from zipline.api import attach_pipeline, order_target_percent, pipeline_output, schedule_function
from zipline.finance import commission, slippage
from zipline.pipeline import Pipeline
from zipline.pipeline.factors import RSI


def make_pipeline():
    rsi = RSI()
    return Pipeline(
        columns={"longs": rsi.top(3), "shorts": rsi.bottom(3)},
    )


def initialize(context):
    attach_pipeline(make_pipeline(), "my_pipeline")
    schedule_function(rebalance)
    context.set_commission(commission.PerShare(cost=0.001, min_trade_cost=1.0))
    context.set_slippage(slippage.VolumeShareSlippage())


def before_trading_start(context, data):
    context.pipeline_data = pipeline_output("my_pipeline")


def rebalance(context, data):
    pipeline_data = context.pipeline_data
    longs = pipeline_data.index[pipeline_data.longs]
    shorts = pipeline_data.index[pipeline_data.shorts]

    for asset in longs:
        order_target_percent(asset, 1.0 / 3.0)
    for asset in shorts:
        order_target_percent(asset, -1.0 / 3.0)

    for asset in context.portfolio.positions:
        if asset not in longs and asset not in shorts and data.can_trade(asset):
            order_target_percent(asset, 0)

Example 2: Multi-Factor Ranking

Combine multiple factors with ranking and normalization:

from zipline.pipeline import Pipeline
from zipline.pipeline.factors import AverageDollarVolume, Returns, RSI


def make_pipeline():
    # Factor definitions
    momentum = Returns(window_length=20).rank()
    mean_reversion = -Returns(window_length=5).rank()
    rsi_signal = RSI().rank()

    # Composite score (equal-weighted)
    composite = (momentum + mean_reversion + rsi_signal).rank()

    # Liquidity filter
    liquid = AverageDollarVolume(window_length=30).top(100)

    return Pipeline(
        columns={
            "score": composite,
            "longs": composite.top(10, mask=liquid),
            "shorts": composite.bottom(10, mask=liquid),
        },
        screen=liquid,
    )

Data Ingestion

Zipline supports CSV-based data bundles for any market:

# In ~/.zipline/extension.py
from zipline.data.bundles import register
from zipline.data.bundles.csvdir import csvdir_equities

register(
    "my-data",
    csvdir_equities(["daily"], "/path/to/csv/dir"),
    calendar_name="XNYS",
)
# Ingest and run
zipline ingest -b my-data
zipline run -f strategy.py --start 2020-1-1 --end 2024-1-1 -o results.pickle --no-benchmark -b my-data

More examples in the examples directory.

Compatibility Notes

Release 3.05 — Compatible with NumPy 2.0 (requires pandas >= 2.2.2)

Release 3.0 — Updated to pandas >= 2.0 and SQLAlchemy > 2.0

Release 2.4 — Updated to exchange_calendars >= 4.2

Contributing

This project is sponsored by Kavout.

Found a bug or have a suggestion? Open an issue.

License

Apache 2.0. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zipline_refresh-4.0.0.tar.gz (13.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zipline_refresh-4.0.0-cp312-cp312-macosx_11_0_arm64.whl (5.0 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file zipline_refresh-4.0.0.tar.gz.

File metadata

  • Download URL: zipline_refresh-4.0.0.tar.gz
  • Upload date:
  • Size: 13.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for zipline_refresh-4.0.0.tar.gz
Algorithm Hash digest
SHA256 22dd801c432a077b96ed87a6d6949dd1d1e61f921a856531a601a7b151a56072
MD5 28a546ff85a1db13814b62fc041ee22f
BLAKE2b-256 ccfbc80e01028b49cda068eab4394332baeaddf8f8c9847b411400806e807ccd

See more details on using hashes here.

File details

Details for the file zipline_refresh-4.0.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for zipline_refresh-4.0.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 38a48a2434c60ec9cc9d1e6e448562bfe2b3d1813404b5072a078a8499e3f3bb
MD5 c204ff14746a6850d99d0e0d1524fe29
BLAKE2b-256 aeaa3981ebe4d7eb5f926cf4f3bbab1a8df6c20d39c8659780c985dea4be4437

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page