Skip to main content

Zipline extension to provide bundles of data from Norgate Data into the Zipline algorithmic trading library for the Python programming language

Project description

alt text alt text

Integrates financial market data provided by Norgate Data with Zipline, a Pythonic algorithmic trading library for backtesting.

Key features of this extension

  • Simple bundle creation
  • Survivorship bias-free bundles
  • Incorporates time series data such as historical index membership and dividend yield into Zipline's Pipeline mechanism
  • No modifications to the Zipline code base (except to fix problems with installation and obsolete calls that crash Zipline)

Requirements

  • Zipline 2.0.0 and above (based upon the Zipline Reloaded fork led by Stefan Jansen, which originates from the Quantopian-developed Zipline which has become abandonware. We recommend the latest release of Zipline Reloaded (currently v2.2) and associated packages (such as exchange-calendars) - there are too many quirks and workarounds for issues with older versions of Zipline to continue to maintain backwards compatibility.
  • Python 3.8 or 3.9 (3.8 if you use Pyfolio)
  • Microsoft Windows
  • An active Norgate Data subscription
  • Norgate Data Updater software installed and running
  • Writable local user folder named .norgatedata (or defined in environment variable NORGATEDATA_ROOT) - defaults to C:\Users\Your username\.norgatedata
  • Python packages: Pandas, Numpy, Logbook

Note: The "Norgate Data Updater" application (NDU) is a Windows-only application. NDU must be running for this Python package to work.

How to install Zipline using Anaconda/Miniconda

Most people have problems installing Zipline because they attempt to install it into their base environment. The solution is simple: Create a separate virtual environment that only has the necessary Python pacakges you require. If you want to experiment then just create a new environment.

Firstly, install either Anaconda (graphical environment) or Miniconda (cut-down command-line-based). These instructions relate to Windows only.

How to install Zipline Reloaded and PyFolio, and Zipline-NorgateData

Zipline can be difficult to install if you install pre-requisities in the wrong order, plus there's a few compatibility quirks, and ugly warning messages with some of the newer packages that can be avoided. Here's how we did it here at Norgate:

Install the latest 64 bit MiniConda or Anaconda Distribution.

If you have ANY other running instances of Anaconda prompt/jupyter etc., ensure sure they are all shut down.

Start an Anaconda (base) prompt, create an environment and install the appropriate versions of packages:

conda create -y -n zip38 python=3.8
conda activate zip38
conda install -y -c conda-forge mamba
mamba install -y -c ml4t -c conda-forge -c ranaroussi pandas==1.3.5 numpy==1.22.4 zipline-reloaded pyfolio-reloaded 
mamba install -y -c conda-forge jupyter
pip install norgatedata zipline-norgatedata
if not exist %HOMEPATH%\.zipline mkdir %HOMEPATH%\.zipline
if not exist %HOMEPATH%\.zipline\extension.py copy /b NUL > %HOMEPATH%\.zipline\extension.py

Note: Mamba is used to install zipline-reloaded, because the Conda package manager becomes confused with so many dependencies required. Mamba is also about 10 times quicker than Conda.

Further Note: There are some incompabilities in Panas beyond 1.4 and above, and Numpy beyond 1.30 and above (as at 4 Jul 2022). The above instructions force a specific version.

Important Note: You will also need to make fixes directly to the some source code regarding trading calendars before 1990 (Equities) and 2000 (Futures).

Upgrades of Zipline-NorgateData

To receive upgrades/updates

pip install zipline-norgatedata --upgrade

Exchange Calendar Issues that require patching

Norgate Data has developed the following patches. Please make sure you implement all of them (most support messages we get are from people that have forgotten to patch all of the files).

Patch to increase backtesting calendar limits for futures trading

Zipline will only backtest according to the calendar within the exchange_calendars package and has some nonsensical defaults for futures.

With some easy patches you can extend backtesting Futures from 2000 to 1970.

  • Navigate to the exact path of your Python environment installation (e.g. For Miniconda: C:\Users\\miniconda3\envs\zip38 For Anaconda: C:\Users\\Anaconda3\envs\zip38 )
  • Then navigate to Lib\site-packages\exchange_calendars (i.e. full path for an environment named zip35 would be "C:\Users\\[miniconda3 or Anaconda3]\envs\zip38\Lib\site-packages\exchange_calendars")
  • Edit the file us_futures_calendar.py and find the following lines (around line 50):
                 return Timestamp("2000-01-01", tz=UTC)

change this to:

                 return Timestamp("1970-01-01", tz=UTC)

Backtest Assumptions

  • Stocks are automatically set an auto_close_date of the last quoted date
  • Futures are automatically set an auto_close_date to the earlier of following: 2 days prior to last trading date (for cash settled futures, and physically delivered futures that only allow delivery after the last trading date), or 2 trading days prior to first notice date for futures that have a first notice date prior to the last trading date.

Bundle Creation

Navigate to your Zipline local settings folder. This is typically located at c:\users\\.zipline

Add the following lines at the top of your Zipline local settings file - extension.py (: Note: This is NOT the extension.py file inside the Anaconda3\envs\\lib\site-packages\zipline

from norgatedata import StockPriceAdjustmentType
from zipline_norgatedata import (
    register_norgatedata_equities_bundle,
    register_norgatedata_futures_bundle )

Then create as many bundles definitions as you desire. These bundles will use either a given symbol list, one or more watchlists from your Norgate Data Watchlist Library and (for futures markets) all contracts belonging to a given set of futures market session symbols.

Here are some examples with varying parameters. You should adapt these to your requirements.

register_norgatedata_equities_bundle has the following default parameters: stock_price_adjustment_setting = StockPriceAdjustmentType.TOTALRETURN, end_session = 'now', calendar_name = 'NYSE', excluded_symbol_list = None,

register_norgatedata_futures_bundle has the following default parameters: end_session = 'now', calendar_name = 'us_futures', excluded_symbol_list = None,

# EQUITIES BUNDLES

# Single stock bundle - AAPL from 1990 though 2018
register_norgatedata_equities_bundle(
    bundlename = 'norgatedata-aapl',
    symbol_list = ['AAPL','$SPXTR',], 
    start_session = '1990-01-01',
    end_session = '2020-12-01'
)

# FANG stocks (Facebook, Amazon, Netflix, Google) - 2012-05-18 until now
register_norgatedata_equities_bundle(
    bundlename = 'norgatedata-fang',
    symbol_list = ['FB','AMZN','NFLX','GOOGL','$SPXTR',], 
    start_session = '2012-05-18',  # This is that FB first traded
)

# A small set of selected ETFs
register_norgatedata_equities_bundle(
    bundlename = 'norgatedata-selected-etfs',
    symbol_list = ['SPY','GLD','USO','$SPXTR',],
    start_session = '2006-04-10', # This is the USO first trading date
)

# S&P 500 Bundle for backtesting including all current & past constituents back to 1990
# and the S&P 500 Total Return index (useful for benchmarking and/or index trend filtering)
# (around 1800 securities)
register_norgatedata_equities_bundle(
    bundlename = 'norgatedata-sp500',
    symbol_list = ['$SPXTR'],
    watchlists = ['S&P 500 Current & Past'],
    start_session = '1990-01-01',
)

# Russell 3000 bundle containing all ccurrent & past constituents back to 1990
# and the Russell 3000 Total Return Index (useful for benchmarking and/or index trend filtering)
# (about 11000 securities)

register_norgatedata_equities_bundle(
    bundlename = 'norgatedata-russell3000',
    watchlists = ['Russell 3000 Current & Past'],
    symbol_list = ['$RUATR'],
    start_session = '1990-01-01' ,
)

# And now a watchlist excluding a given list of symbols

register_norgatedata_equities_bundle(
    bundlename = 'norgatedata-russell3000',
    watchlists = ['Russell 3000 Current & Past'],
    symbol_list = ['$RUATR'],
    start_session = '1990-01-01' ,
    excluded_symbol_list = ['TSLA','AMZN','FB','NFLX','GOOGL',]
)

# FUTURES BUNDLES

# Example bundle for all of the individual contracts from three futures markets:
# E-mini S&P 500, E-mini Nasdaq 100, E-mini Russell 2000,
# with $SPXTR added for benchmark reference
register_norgatedata_futures_bundle(
    bundlename = 'norgatedata-selected-index-futures',
    session_symbols = ['ES','NQ','RTY'],
    symbol_list = ['$SPXTR']
    start_session = '2000-01-01',
)


# Bundle of futures used in Andreas Clenow's Trading Evolved book
# (contains 6000+ individual futures contracts/deliveries)
bundlename = 'norgatedata-tradingevolved-futures'
symbol_list = ['$SPXTR',]
session_symbols = [
	'6A', # AUD
	'6B', # GBP
	'6C', # CAD
	'6E', # EUR
	'DX', # USDX
	'6J', # JPY
	'6N', # NZD
	'6S', # CHF
	'LBS', # Lumber
	'ZC', # Corn
	'CT', # Cotton
	'GF', # Feeder Cattle
	'KC', # Coffee
	'LRC', # Robusta Coffee
	'LSU', # White Sugar
	'ZO', # Oats
	'ZS', # Soybeans
	'SB', # Sugar
	'ZM', # Soybean Meal
	'ZW', # Wheat
	'CL', # Crude Oil
	'GC', # Gold
	'HG', # Copper
	'HO', # NY Harbor ULSD 
	'GAS', # Gas Oil
	'NG', # Henry Hub Natural Gas
	'PA', # Palladium
	'PL', # Platinum
	'RB', # RBOB Gasoline
	'SI', # Silver
	'ES', # E-mini S&P 500
	'NKD', # Nikkei 225 Dollar
	'NQ', # E-mini Nasdaq-100
	'STW', # MSCI Taiwan 
	'VX', # Cboe Volatility Index
	'YM', # E-mini Dow
	'GE', # Eurodollar
	'ZF', # 5-Year US T-Note
	'ZT', # 2-Year US T-Note
	'ZN', # 10-Year US T-Note
	'ZB', # 30-Year US T-Bond        
]
start_session = '2000-01-01',

register_norgatedata_futures_bundle(bundlename,start_session,session_symbols = session_symbols, symbol_list = symbol_list )

To ingest a bundle:

zipline ingest -b <bundlename>

Benchmark against a symbol

To benchmark against an index, you should use add set_benchmark within the intialize function.

 def initialize(context):
    set_benchmark(symbol('$SPXTR')) # Note: $SPXTR must be included in the bundle
    # ...

Pipelines - accessing timeseries data

Timeseries data has been exposed into Zipline's Pipeline interface. During a backtest, the Pipelines will be calculated against all securities in the bundle.

The following Filter (i.e. boolean) pipelines are available:

The following Factor (i.e. float) pipelines are available:

To incorporate these into your trading model, you need to import the relevant packages/methods:

from zipline.pipeline import Pipeline
from zipline_norgatedata.pipelines import (
    NorgateDataIndexConstituent, NorgateDataDividendYield )
from zipline.api import order_target_percent, set_benchmark

It is recommended you put your pipeline construction in its own function:

def make_pipeline():
   indexconstituent = NorgateDataIndexConstituent('S&P 1500')
   divyield = NorgateDataDividendYield()
   return Pipeline(
       columns={
            'NorgateDataIndexConstituent':indexconstituent,
            'NorgateDividendYield':divyield },
       screen = indexconstituent)

Incorporate this into your trading system by attaching it to your initialize method. Note, for better efficiency, use chunks=9999 or however many bars you are likely to need.
This will save unnecessary access to the Norgate Data database.

 def initialize(context):
    set_benchmark(symbol('$SPXTR')) # Note: $SPXTR must be included in the bundle
    attach_pipeline(make_pipeline(), 'norgatedata_pipeline', chunks=9999,eager=True)
    # ...

Now you can access the contents of the pipeline in before_trading_start and/or handle_data by using Zipline's pipline_output method. You can exit positions not already in the

def before_trading_start(context, data):
    context.pipeline_data = pipeline_output('norgatedata_pipeline')
    # ... your code here ...
    # For example, you could 

def handle_data(context, data):
    context.pipeline_data = pipeline_output('norgatedata_pipeline')
    current_constituents = context.pipeline_data.index

    # ... your code here ...

    # Exit positions not in the index today
    for asset in context.portfolio.positions:   
        if asset not in current_constituents:
            order_target_percent(asset,0.0)

    # ... your code here ...

Note: Access to historical index constituents requires a Norgate Data Stocks subscription at the Platinum or Diamond level.

Worked example backtesting S&P 500 Constituents back to 1990

This example comprises a backtest on the S&P 500, with a basic trend filter that is applied on the S&P 500 index ($SPX). The total return version of the index is also ingested ($SPXTR) for comparison purposes.

Note: This requires a Norgate Data US Stocks subscription at the Platinum or Diamond level.

Create a bundle definition in extensions.py as follows:

from zipline_norgatedata import register_norgatedata_equities_bundle

register_norgatedata_equities_bundle(
    bundlename = 'norgatedata-sp500-backtest',
    symbol_list = ['$SPX','$SPXTR',],
    watchlists = ['S&P 500 Current & Past',],
    start_session = '1990-01-01',
)

Now, ingest that bundle into zipline:

zipline ingest -b norgatedata-sp500-backtest

Inside your trading system file, you'd incorporate the following code snippets:

from zipline.pipeline import Pipeline
from zipline_norgatedata.pipelines import (
    NorgateDataIndexConstituent, 
    NorgateDataDividendYield)

...

def make_pipeline():
    indexconstituent = NorgateDataIndexConstituent('S&P 500')
    return Pipeline(
        columns={
             'NorgateDataIndexConstituent':indexconstituent,
        },
        screen = indexconstituent)

 def initialize(context):
    set_benchmark(symbol('$SPXTR')) # Note: $SPXTR must be included in the bundle
    attach_pipeline(make_pipeline(), 'norgatedata_pipeline', chunks=9999,eager=True)
    # ... your code here ...

def before_trading_start(context, data):
    context.pipeline_data = pipeline_output('norgatedata_pipeline')
    # ... your code here ...

def handle_data(context, data):
    context.pipeline_data = pipeline_output('norgatedata_pipeline')
    current_constituents = context.pipeline_data.index

    # ... your code here ...

    # Exit positions not in the index today
    for asset in context.portfolio.positions:   
        if asset not in context.assets:
            order_target_percent(asset,0.0)

    # ...

Worked example backtesting E-Mini S&P 500 futures

This example created a continuous contract of the E-Mini S&P 500 futures that trade on CME on volume.

Create a bundle definition in extensions.py as follows:

from zipline_norgatedata import register_norgatedata_futures_bundle

bundlename = 'norgatedata-es-futures'
session_symbols = ['ES',]
symbol_list = ['$SPXTR',],
start_session = '2000-01-01'
register_norgatedata_futures_bundle(bundlename,start_session,session_symbols = session_symbols )

Now, ingest that bundle into zipline:

zipline ingest -b norgatedata-es-futures

Inside your trading system file, you'd incorporate the following code snippets:

 def initialize(context):
    set_benchmark(symbol('$SPXTR')) # Note: $SPXTR must be included in the bundle
    # Obtain market(s)s directly from the bundle
    af =  context.asset_finder
    markets = set([]) # a set eliminates dupes
    allcontracts = af.retrieve_futures_contracts(af.futures_sids)
    for contract in allcontracts:
        markets.add(allcontracts[contract].root_symbol)

    markets = list(markets)
    markets.sort()

    # Make a list of all continuations
    context.universe = [
        continuous_future(market, offset=0, roll='volume', adjustment='mul')
            for market in markets
    ]
    # ... your code here ...

def handle_data(context, data):
    # Get continuation data
    hist = data.history(
        context.universe, 
        fields=['close','volume'], 
        frequency='1d', 
        bar_count=250,  # Adjust to whatever lookback period you need
    )

    # Now use hist in your calculations 

    # Make a dictionary of open positions, based on the root symbol
    open_pos = {
        pos.root_symbol: pos 
        for pos in context.portfolio.positions
    } 

    contracts_to_trade = 5

    for continuation in context.universe:
        # ...
        contract = data.current(continuation, 'contract')
        # ...


        # Add your condtions here to determine if there is an entry then...
        order_target(contract,  contracts_to_trade)

        # Add your conditions to determine if there is an exit of a position then...
        order_target(contract, -1 * contracts_to_trade)

    # Finally, if there are open positions check for rolls
    if len(open_pos) > 0:   
        roll_futures(context, data)           

Metadata

The following fields are available in the metadata dataframe: start_date, end_date, ac_date, symbol, asset_name, exchange, exchange_full, asset_type, norgate_data_symbol, norgate_data_assetid.

Norgate Data Futures Market Session symbols

To obtain just the futures market sessions symbols, you can use the norgatedata package and adapt the following code:

import norgatedata
for session_symbol in norgatedata.futures_market_session_symbols():
    print (session_symbol + " " + norgatedata.futures_market_session_name(session_symbol)) 

Zipline Futures root symbols

To show the translated 2 character root symbols for each futures market session, and a description of each market you can run a tiny script (or adapt this):

import zipline_norgatedata
root_symbols_dict = zipline_norgatedata.zipline_futures_root_symbols_dict()
print (root_symbols_dict)

Zipline Limitations/Quirks

  • Zipline is hard-coded to handle equities data from 1990 onwards only
  • Zipline is hard-coded handle futures data from 2000 onwards.
  • Zipline has unnecessarily complicated futures contracts by restricting symbols to 2 characters. This is not a conventional followed by exchanges. We hope they see the light and allow variable futures root symbol lengths (up to 5 characters). In the meantime, you can get a list of futures market sessions covered and translated to their 2 character limit with: zipline_futures_root_symbols()
  • Zipline doesn't define all futures markets and doesn't provide any runtime extensibility in this area - you will need to add them to <your_environment>\lib\site-packages\zipline\finance\constants.py if they are not defined. Be sure to backup this file as it will be overwritten any time you update zipline.
  • Zipline assumes that there are bars for every day of trading. If a security doesn't trade for a given day (e.g. it was halted/suspended, or simply nobody wanted to trade it), it will be padded with the previous close repeated in the OHLC fields, with volume set to zero. Consider how this might affect your trading calculations.
  • Index volumes cannot be accurately ingested due to Zipline trying to convert large volumes to UINTs which are out-of-bounds for UINT32. Index volumes will be divided by 1000.
  • Any stock whose adjusted volume exceeds the upper bound of UINT32 will be set to the maximum UINT32 value (4294967295). This only occurs for stocks with a lot of splits and/or very large special dsitributions.
  • Some stocks have adjusted volume values that fall below the boundaries used by winsorize_uint32 (e.g. volume of 8.225255e-05). You'll see a warning when those stocks are ingested "UserWarning: Ignoring 12911 values because they are out of bounds for uint32". These are There's not much we can do here. For now, just ignore those warnings.
  • Ingestion times could be improved significantly with multiprocessing (this requires Zipline enhancements)

Testing on ASX data

By default, run_algorithm uses the 'NYSE' trading calendar. To backtest other markets, you need to specify the calendar. For the ASX, the calendar name is XASX.

At the top of your algorithm:

from trading_calendars import get_calendar

In the run_algorithm call, add a trading_calendar= line, for example:

results = run_algorithm(
    start=start, end=end, 
    initialize=initialize, analyze=analyze, 
    handle_data=handle_data, 
    capital_base=10000, 
    trading_calendar=get_calendar('XASX'),
    data_frequency = 'daily', 
    bundle='norgatedata-spasx200',
)

Books/publications that use Zipline, adapted for Norgate Data use

We have adapted the Python code in the following books to use Norgate Data.
Trading Evoled: Anyone can Build Killer Trading Strategies in Python.

Source code compatible with Zipline (Reloaded) v2 in Jupyter notebook format, can be downloaded here: http://norgatedata.com/book-examples/trading-evolved/NorgateDataTradingEvolvedExamples.zipline220.zip

If there are other book/publications that use Zipline and worth adding here, let us know.

FAQs

During a backtest I receive an error ValueError: 'Time Period' is not in list. How do I fix this?

This can occur when the items in the bundle do not match the latest data in the Norgate Data database. For stocks, if there are symbol changes within the database then the bundle will have the old symbol but the Norgate database will have the new symbol. For Futures, there may have been additional futures contracts listed since your previous ingestion and the roll-over algorithm is trying to roll into them.

The solution is simple: Ingest the bundle with fresh data.

Change log

Released versions and release dates can be seen here: https://pypi.org/project/zipline-norgatedata/#history

The CHANGES.TXT within the package details the changes.

Installing older versions

Older versions of Zipline-NorgateData can be installed easily using pip. For example, to install v2.0.17:

pip install zipline-norgatedata==2.0.17

Support

For support on Norgate Data or usage of the zipline-norgatedata extension: Norgate Data support

Please put separate issues in separate emails, as this ensures each issue is separately ticketed and tracked.

For Zipline coding/usage issues, join the Zipline Google Group. For bug reports on Zipline Reloaded, report them on Stefan Jansen's Zipline Reloaded Github

Thanks

Thanks to:

  • Andreas Clenow for his pioneering work in documenting Zipline bundles in his latest book Trading Evolved: Anyone can Build Killer Trading Strategies in Python. We used many of the techniques described in the book to build our bundle code. There are many excellent examples of how to implement various trading systems including trend following, counter trend following, momentum, curve trading and combining multiple trading systems together.
  • Norgate Data alpha and beta testers. Without your persistence we wouldn't have implemented half of the features.
  • The team at Quantopian for developing and open sourcing Zipline
  • Continued development efforts on Zipline since Quantopian ceased, from Stefan Jansen, Mehdi Bounouar, Allan Coppola and Shlomi Kushchi.

Project details


Release history Release notifications | RSS feed

This version

2.2.7

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

zipline_norgatedata-2.2.7-py3-none-any.whl (45.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page