generate alpha factors
Project description
This programme is to automatically generate alpha factors and filter relatively good factors with back-testing methods. Time consuming parts are optimized with numba package.
Dependencies
python >= 3.5
pandas >= 0.22.0
numpy >= 1.14.0
RNWS >= 0.2.1
numba >= 0.38.0
single_factor_model>=0.3.0
IPython 5.1.0
empyrical
alphalens
Note: It is best to use the latest version of llvmlite in order to make numba work properly. Otherwise it may couse a kernel-dies situation.
Example
load packages and read in data
from alpha_factory import generator_class,get_memory_use_pct,clean
from RNWS import read
import numpy as np
import pandas as pd
start=20180101
end=20180331
factor_path='.'
frame_path='.'
df=pd.read_csv(frame_path+'/frames.csv')
## read in data
re=read.read_df('./re',file_pattern='re',start=start,end=end)
cap=read.read_df('./cap',file_pattern='cap',header=0,dat_col='cap',start=start,end=end)
open_price,close,vwap,adj,high,low,volume,sus=read.read_df('./mkt_data',file_pattern='mkt',start=start,end=end,header=0,dat_col=['open','close','vwap','adjfactor','high','low','volume','sus'])
ind1,ind2,ind3=read.read_df('./ind',file_pattern='ind',start=start,end=end,header=0,dat_col=['level1','level2','level3'])
inx_weight=read.read_df('./ZZ800_weight','Stk_ZZ800',start=start,end=end,header=None,inx_col=1,dat_col=3)
Note:frames contains columns as: df_name,equation,dependency,type, where type includes df,cap,group. In this case frames.csv have df_name: re,cap,open_price,close,vwap,high,low,volume,ind1,ind2,ind3.
You can also read data by using pd.read_csv directly depending on how you store your data.
start to generate
parms={'re':close.mul(adj).pct_change()
,'cap':cap
,'open_price':open_price
,'close':close
,'vwap':vwap
,'high':high
,'low':low
,'volume':volume
,'ind1':ind1
,'ind2':ind2
,'ind3':ind3}
with generator_class(df,factor_path,**parms) as gen:
gen.generator(batch_size=3,name_start='a')
gen.generator(batch_size=3,name_start='a')
gen.output_df(path=frame_path+'/frames_new.csv')
continue to generate with existing frames and factors
with generator_class(df,factor_path,**parms) as gen:
gen.reload_df(path=frame_path+'/frames_new.csv')
gen.reload_factors(align=True)
clean()
for i in range(5):
gen.generator(batch_size=2,name_start='a')
print('step %d memory usage:\t %.1f%% \n'%(i,get_memory_use_pct()))
if get_memory_use_pct()>80:
break
gen.output_df(path=frame_path+'/frames_new2.csv')
Note: It is very important to align all factors and initial dataframes before generating.
you can also choose how to store your factors by setting store_method
backtesting with stratified sampling approach and ic-ir meansure after generation
data_box_param={'ind':ind1
,'price':vwap*adjfactor
,'sus':sus
,'ind_weight':inx_weight
,'path':'./databox'
}
back_test_param={'sharpe_ratio_thresh':3
,'n':5
,'out_path':'.'
,'back_end':'loky'
,'n_jobs':6
,'detail_root_path':None
,'double_side_cost':0.003
,'rf':0.03
}
icir_param={'ir_thresh':0.4
,'out_path':'.'
,'back_end':'loky'
,'n_jobs':6
}
with generator_class(df,factor_path,**parms) as gen:
for i in range(5):
gen.generator(batch_size=2,name_start='a')
gen.output_df(path=frame_path+'/frames_new.csv')
gen.getOrCreate_databox(**data_box_param)
gen.back_test(**back_test_param)
gen.icir(**icir_param)
clean()
if get_memory_use_pct()>90:
print('Memory exceeded')
break
generate script of factors
from alpha_factory import write_file
import pandas as pd
df2=pd.read_csv(frame_path+'/frames_new.csv')
write_file(df2,'script.py')
locate a factor
from alpha_factory.utilise import get_factor_path
factor_name='a0'
path=get_factor_path(factor_path,factor_name)
only when storage_method='byTime'
use your own functions
To use your own functions you need to append your code in class functions from basic_functions.py in the sourse file and also append the corresponding names in functions.csv from data file in the sourse file.
After that you can set debug=True in generator function to check if there is any bug from all those functions. If indeed there is, a new embeded ipython would be activated to help you find out what is going on in the loop.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for alpha_factory-0.3.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 128017036c3518b944400bfb12067b0e88a620ee27d2c1c4888b3fe4e43e6153 |
|
MD5 | 45a7190ff9ec30ca35c6393ab4abf66b |
|
BLAKE2b-256 | b2e42267fb4b34c7e01136458cfe92fc6f7593bca655d94c733ff6cbe9117164 |