A set of tools useful in exploring statistical arbitrage
Project description
StatArbTools
StatArbTools is a Python library primarily for determining if a pair of time series are cointegrated. It also includes tools for generating an array of log returns from a price array, looking for a linear relationship, and creating a potentially stationary distribution.
Installation
Use the package manager pip to install StatArbTools
pip install StatArbTools
You must also have numpy, scikitlearn, statsmodels, and matplot lib installed
pip install numpy
pip install sklearn
pip install statsmodels
pip install matplotlib
Usage
import StatArbTools StatArbTools.gen_log_returns(numpy_time_series_1, numpy_time_series_2) # returns numpy arrays of the log returns for each time series StatArbTools.gen_linear_relationship(numpy_log_returns_1, numpy_log_returns_2) # returns the coefficient from a linear regression between the two log returns arrays StatArbTools.gen_stationary_distr(numpy_log_returns_1, numpy_log_returns_2, coefficient) # returns the linear combination of the two log returns arrays based on a linear regression coefficient StatArbTools.test_stationarity(numpy_time_series_1, numpy_time_series_2) # returns "Rejected null hypothesis" if the null hypothesis of an Augmented Dickey Fuller test is rejected and "Failed to reject" otherwise. It also returns the pvalue of the ADF test. StatArbTools.find_pairs(ticker_data_dict) # returns a list of lists of the results of StatArbTools.test_stationarity for every possible pair of tickers in the dictionary StatArbTools.portfolio_builder(ticker_data_dict, results) # returns a list of lists of every pair that passed the Augmented Dickey Fuller test and the profitability rating of the pair StatArbTools.profitability_rating(stationary_distr) # returns a general metric of how profitable a pairs trade would be (note: this is not in dollars, just a means of ranking based on volatility) StatArbTools.plot(stationary_distribution) # plots the passed distribution
gen_log_returns takes a numpy array of a time series as a parameter and returns a numpy array of the natural log of the price returns
gen_linear_relationship takes two numpy arrays of time series as parameters and returns the coefficient of the least squares regression between the two. For the statistical arbitrage use case this is used on the log returns of the two stocks to allow for stationarity testing
gen_stationary_distr takes two time series and the coefficient of a linear regression between the two (time series must be entered in same order as they were into the linear regression) and returns the linear combination of the time series that would potentially result in a stationary distribution if the two were cointegrated
test_stationarity takes two time series as parameters, constructs log return arrays, determines the linear relationship, constructs a potentially stationary distribution, and then uses an Augmented Dickey Fuller test at the 1% significane level to check if the distribution is indeed stationary. If it is it will return "Rejected null hypothesis" and the pvalue of the test. If it does not pass the 1% threshold it will return "Failed to reject" and the pvalue.
find_pairs takes a dictionary with tickers (ie 'aapl', 'adsk', etc) as the keys and a numpy time series array of each ticker's raw prices as the values. It then constructs every possible pairing of tickers and tests them for cointegration using test_stationarity It will then return an list of lists where each list contains the ticker pair, the null hypothesis result, and the pvalue of the test for each pair.
portfolio_builder takes a dictionary with tickers(ie 'aapl', 'adsk', etc) as the keys and a numpy time series array of each ticker's raw prices as the values. It also takes the results array from the find_pairs method. It then looks at every pair that rejected the null hypothesis, puts the stationary distribution that results from them through profitability_rating to get a metric to rank based off of. It then constructs a list of lists with the pairs and their ratings and returns that list of lists sorted by the rating
profitability_rating takes a stationary distribution numpy array as an argument, computes the mean and standard deviation of the array and establishes bounds for a 95% confidence interval around the mean. It then constructs a list with each price in the time series in which the price surpasses this confidence interval. It then returns the mean of those prices multiplied by the number of those prices. It is not an absolute measure of money made but it gives a method of ranking pairs.
plot takes a time series and does a simple matplotlib.pyplot plot. This is primarily useful for visualizing the stationarity of a distribution.
Contributing
For changes, please open an issue first to discuss what you would like to change.
License
Project details
Release history Release notifications
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size StatArbTools0.0.3py3noneany.whl (3.8 kB)  File type Wheel  Python version py3  Upload date  Hashes View hashes 
Filename, size StatArbTools0.0.3.tar.gz (3.0 kB)  File type Source  Python version None  Upload date  Hashes View hashes 
Hashes for StatArbTools0.0.3py3noneany.whl
Algorithm  Hash digest  

SHA256  ae8cfdd924d1ea4b0a0c0ec6e72d87772b8bcb008242871d0efcbe3c9175b27c 

MD5  7513aa6375551cabfff057699e30fa02 

BLAKE2256  973cfb86dd2c8028e4f365bacd251443880c558e404fdd42b7bf72048a8865c2 