factor model
Project description
This project is to merge alpha factors into one factor with machine learning techniques.
Dependencies
python 3.5
pandas 0.22.0
numpy 1.14.3
pickle
sklearn 0.19.1
databox
Example
Preprocessing data
First create a databox object with original factors and market info. More can be found in project databox
from databox import databox
db=databox()\
.load_indestry(ind)\
.load_indexWeight(ind_weight)\
.load_suspend(sus)\
.load_adjPrice(price)\
.set_lag(freq='d',day_lag=1)
for fac_name,fac_df in factors_dictionary.items():
db.add_factor(fac_name,fac_df)
db.align_data()\
.factor_ind_neutral()\
.factor_size_neutral()
Then custmize your data for model training
sp=sample_pipeline()\
.set_fw_return_n(1)\
.set_sample_n(1)\
.factor_rank()\
.factor_zscore()\
.fw_return_ind_neutral()\
.fw_return_rank()\
.fw_return_I(thresh=2000)
Note all returns are multiplied by 100 for better modeling.
Now create sample as
X_train,y_train=sp.train_set(db)
X_test,y_test=sp.test_set(db)
X_test_all=sp.test_X(db)
Modeling
Classification Method
from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier()
tn,tt,ml=clf_model(clf,X_train,y_train,X_test,y_test)
Where y can be 0/1 or float and result tn (train) and tt (test) would be different depending on this. If clf is a tree based model, ml would be feature importance. If clf is a linear model, ml would be coeffient.
We can also creat a model by combining several models.
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
clf1=RandomForestClassifier()
clf2=LogisticRegression()
clf3=SVC()
from multi_factor_model import combine_clf_models
CLF=combine_clf_models()\
.add_clf('rf',clf1)\
.add_clf('lr',clf2)\
.add_clf('svc',clf3,weight=2)#default weight is 1
tn,tt,ml=clf_model(CLF,X_train,y_train,X_test,y_test)
Regression Method Same as Classification method with reg_model as the replacement of clf_model and combine_reg_models as that of combine_clf_models
Combined Factor
import pandas as pd
value=CLF.predict_proba(X_test_all)
factor=pd.Series(value[:,1],index=X_test_all.index)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for multi_factor_model-0.0.0a4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ace1242ff7716cd1e9d5e0d69bd889a1cb397882228a4dc805641e035b2d0799 |
|
MD5 | 3e0105835141b7dfcd674d8c98c0b88f |
|
BLAKE2b-256 | 23640a4b87ddfbc59772bb716a1274975bfa72b33e1710b55180c0e32e7aab71 |