Skip to main content

factor model

Project description

This project is to merge alpha factors into one factor with machine learning techniques.

Dependencies

  • python 3.5

  • pandas 0.22.0

  • numpy 1.14.3

  • pickle

  • sklearn 0.19.1

  • databox

Example

Preprocessing data

First create a databox object with original factors and market info. More can be found in project databox

from databox import databox
db=databox()\
    .load_indestry(ind)\
    .load_indexWeight(ind_weight)\
    .load_suspend(sus)\
    .load_adjPrice(price)\
    .set_lag(freq='d',day_lag=1)
for fac_name,fac_df in factors_dictionary.items():
    db.add_factor(fac_name,fac_df)
db.align_data()\
  .factor_ind_neutral()\
  .factor_size_neutral()

Then custmize your data for model training

sp=sample_pipeline()\
    .set_fw_return_n(1)\
    .set_sample_n(1)\
    .factor_rank()\
    .factor_zscore()\
    .fw_return_ind_neutral()\
    .fw_return_rank()\
    .fw_return_I(thresh=2000)

Note all returns are multiplied by 100 for better modeling.

Options:
set_fw_return is to set the number of days to claculate forward return;
set_sample_n is to set how many days to use in one sample;
factor_rank is to rank all factors in each sample;
factor_zscore is to normalize factors in each sample;
fw_return_ind_neutral is to neutralize returns by industry. If the portfolio have industry constrain, this is likely to improve the training result;
fw_return_rank is to convert returns to their rank in each sample;
fw_return_I is to convert returns as 0 or 1, indicating whether the return value is geater than or equal to the threshold;

Now create sample as

X_train,y_train=sp.train_set(db)
X_test,y_test=sp.test_set(db)
X_test_all=sp.test_X(db)

Modeling

Classification Method

from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier()
tn,tt,ml=clf_model(clf,X_train,y_train,X_test,y_test)

Where y can be 0/1 or float and result tn (train) and tt (test) would be different depending on this. If clf is a tree based model, ml would be feature importance. If clf is a linear model, ml would be coeffient.

We can also creat a model by combining several models.

from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
clf1=RandomForestClassifier()
clf2=LogisticRegression()
clf3=SVC()
from multi_factor_model import combine_clf_models
CLF=combine_clf_models()\
    .add_clf('rf',clf1)\
    .add_clf('lr',clf2)\
    .add_clf('svc',clf3,weight=2)#default weight is 1
tn,tt,ml=clf_model(CLF,X_train,y_train,X_test,y_test)

Regression Method Same as Classification method with reg_model as the replacement of clf_model and combine_reg_models as that of combine_clf_models

Combined Factor

import pandas as pd
value=CLF.predict_proba(X_test_all)
factor=pd.Series(value[:,1],index=X_test_all.index)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multi_factor_model-0.0.0a5.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

multi_factor_model-0.0.0a5-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file multi_factor_model-0.0.0a5.tar.gz.

File metadata

  • Download URL: multi_factor_model-0.0.0a5.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.14.2 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.5

File hashes

Hashes for multi_factor_model-0.0.0a5.tar.gz
Algorithm Hash digest
SHA256 314ae7e24a2da837bf394d843180c497d0e0e009f97ed99b038d883c989d0c8d
MD5 57bfbf1d13d9b7607fdafcb9fb5453ab
BLAKE2b-256 f5a01778ac294ce0494a65486b81481d5eeff65355ddd9db55fde9c736dd6650

See more details on using hashes here.

File details

Details for the file multi_factor_model-0.0.0a5-py3-none-any.whl.

File metadata

  • Download URL: multi_factor_model-0.0.0a5-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.14.2 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.5

File hashes

Hashes for multi_factor_model-0.0.0a5-py3-none-any.whl
Algorithm Hash digest
SHA256 c9e890ae3070a5756dba0376fc0a159d6aa843fd66152939ee591cac5adbde10
MD5 c74c4d7ae6164941338773cf4a690202
BLAKE2b-256 76a9318501b84fc4516f451d1387dc679528c6468f09c6b39c523acabefda76e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page