Portfolio optimization built on top of scikit-learn
Project description
skfolio
skfolio is a Python library for portfolio optimization built on top of scikit-learn. It provides a unified interface and sklearn compatible tools to build, tune and cross-validate portfolio models. It is distributed under the 3-Clause BSD license.
Important links
Documentation: https://skfolio.github.io/skfolio/
Tutorials: https://skfolio.github.io/skfolio/auto_examples/index.html
GitHub Repo: https://github.com/skfolio/skfolio
Installation
The easiest way to install skfolio is using pip:
pip install -U skfolio
or conda:
conda install -c conda-forge skfolio
Dependencies
sklearn requires:
python (>= 3.10)
numpy (>= 1.23.4)
scipy (>= 1.8.0)
pandas (>= 1.4.1)
cvxpy (>= 1.4.1)
scikit-learn (>= 1.3.2)
joblib (>= 1.3.2)
plotly (>= 5.15.0)
Key Concepts
Since the development of modern portfolio theory by Markowitz (1952), mean-variance optimization (MVO) has received considerable attention. Unfortunately it faces a number of shortcomings including high sensitivity to the input parameters (expected returns and covariance), weight concentration, high turnover and poor out-of-sample performance. It is well known that naive allocation (1/N, inverse-vol, …) tend to outperform MVO out-of-sample (DeMiguel, 2007).
Numerous approaches have been developed to alleviate these shortcomings (shrinkage, bayesian approaches, additional constraints, regularization, uncertainty set, higher moments, coherent risk measures, left-tail risk optimization, distributionally robust optimization, factor model, risk-parity, hierarchical clustering, ensemble methods…)
With this large number of methods, added to the fact that they can be composed together there is the need for an unified framework to perform model selection, validation and parameter tuning while reducing the risk of data leakage and overfitting. This framework is build on scikit-learn’s API.
Available models
The current release contains:
- Optimization estimators:
- Naive:
Equal-Weighted
Inverse-Volatility
Random (dirichlet)
- Convex:
Mean-Risk
Risk Budgeting
Maximum Diversification
Distributionally Robust CVaR
- Clustering:
Hierarchical Risk Parity
Hierarchical Equal Risk Contribution
Nested Clusters Optimization
- Ensemble methods:
Stacking Optimization
- Moment estimators:
- Expected Returns:
Empirical
Exponentially Weighted
Equilibrium
Shrinkage (James-Stein, Bayes-Stein, …)
- Covariance:
Empirical
Gerber
Denoising
Denoting
Exponentially Weighted
Ledoit-Wolf
Oracle Approximating Shrinkage
Shrunk Covariance
Graphical lasso CV
- Distance estimator:
Pearson Distance
Kendall Distance
Spearman Distance
Covariance Distance (based on any of the above covariance estimators)
Distance Correlation
Variation of Information
- Prior estimators:
Empirical
Black & Litterman
Factor Model
- Uncertainty Set estimators:
- On Expected Returns:
Empirical
Circular Bootstrap
- On Covariance:
Empirical
Circular bootstrap
- Pre-Selection transformers:
Non-Dominated Selection
Select K Extremes (Best or Worst)
Drop Highly Correlated Assets
- Cross-Validation and Model Selection:
Compatible with all sklearn methods (KFold, …)
Walk Forward
Combinatorial Purged Cross-validation
- Hyper-Parameter Tuning:
Compatible with all sklearn methods (GridSearchCV, RandomizedSearchCV, …)
- Risk Measures:
Variance
Semi-Variance
Mean Absolute Deviation
First Lower Partial Moment
CVaR (Conditional Value at Risk)
EVaR (Entropic Value at Risk)
Worst Realization
CDaR (Conditional Drawdown at Risk)
Maximum Drawdown
Average Drawdown
EDaR (Entropic Drawdown at Risk)
Ulcer Index
Gini Mean Difference
Value at Risk
Drawdown at Risk
Entropic Risk Measure
Fourth Central Moment
Fourth Lower Partial Moment
Skew
Kurtosis
Quickstart
The code snippets below are designed to introduce skfolio’s functionality so you can start using it quickly.
Preparing the data
from sklearn import set_config
from sklearn.model_selection import (
GridSearchCV,
KFold,
RandomizedSearchCV,
train_test_split,
)
from sklearn.pipeline import Pipeline
from scipy.stats import loguniform
from skfolio import RatioMeasure, RiskMeasure
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.model_selection import (
CombinatorialPurgedCV,
WalkForward,
cross_val_predict,
)
from skfolio.moments import (
DenoiseCovariance,
DenoteCovariance,
EWMu,
GerberCovariance,
ShrunkMu,
)
from skfolio.optimization import (
MeanRisk,
NestedClustersOptimization,
ObjectiveFunction,
RiskBudgeting,
)
from skfolio.pre_selection import SelectKExtremes
from skfolio.preprocessing import prices_to_returns
from skfolio.prior import BlackLitterman, EmpiricalPrior, FactorModel
from skfolio.uncertainty_set import BootstrapMuUncertaintySet
prices = load_sp500_dataset()
X = prices_to_returns(prices)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)
Minimum Variance
model = MeanRisk()
Fit on training set
model.fit(X_train)
print(model.weights_)
Predict on test set
portfolio = model.predict(X_test)
print(portfolio.annualized_sharpe_ratio)
print(portfolio.summary())
Maximum Sortino Ratio
model = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.SEMI_VARIANCE,
)
Denoised Covariance & Shrunk Expected Returns
model = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
prior_estimator=EmpiricalPrior(
mu_estimator=ShrunkMu(), covariance_estimator=DenoiseCovariance()
),
)
Uncertainty Set on Expected Returns
model = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
mu_uncertainty_set_estimator=BootstrapMuUncertaintySet(),
)
Weight Constraints & Transaction Costs
model = MeanRisk(
min_weights={"AAPL": 0.10, "JPM": 0.05},
max_weights=0.8,
transaction_costs={"AAPL": 0.0001, "RRC": 0.0002},
groups=[
["Equity"] * 3 + ["Fund"] * 5 + ["Bond"] * 12,
["US"] * 2 + ["Europe"] * 8 + ["Japan"] * 10,
],
linear_constraints=[
"Equity <= 0.5 * Bond",
"US >= 0.1",
"Europe >= 0.5 * Fund",
"Japan <= 1",
],
)
model.fit(X_train)
Risk Parity on CVaR
model = RiskBudgeting(risk_measure=RiskMeasure.CVAR)
Risk Parity & Gerber Covariance
model = RiskBudgeting(
prior_estimator=EmpiricalPrior(covariance_estimator=GerberCovariance())
)
Nested Cluster Optimization with cross-validation and parallelization
model = NestedClustersOptimization(
inner_estimator=MeanRisk(risk_measure=RiskMeasure.CVAR),
outer_estimator=RiskBudgeting(risk_measure=RiskMeasure.VARIANCE),
cv=KFold(),
n_jobs=-1,
)
Randomized Search of the L2 Norm
randomized_search = RandomizedSearchCV(
estimator=MeanRisk(),
cv=WalkForward(train_size=255, test_size=60),
param_distributions={
"l2_coef": loguniform(1e-3, 1e-1),
},
)
randomized_search.fit(X_train)
best_model = randomized_search.best_estimator_
print(best_model.weights_)
Grid Search on embedded parameters
model = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.VARIANCE,
prior_estimator=EmpiricalPrior(mu_estimator=EWMu(alpha=0.2)),
)
print(model.get_params(deep=True))
gs = GridSearchCV(
estimator=model,
cv=KFold(n_splits=5, shuffle=False),
n_jobs=-1,
param_grid={
"risk_measure": [
RiskMeasure.VARIANCE,
RiskMeasure.CVAR,
RiskMeasure.VARIANCE.CDAR,
],
"prior_estimator__mu_estimator__alpha": [0.05, 0.1, 0.2, 0.5],
},
)
gs.fit(X)
best_model = gs.best_estimator_
print(best_model.weights_)
Black & Litterman Model
views = ["AAPL - BBY == 0.03 ", "CVX - KO == 0.04", "MSFT == 0.06 "]
model = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
prior_estimator=BlackLitterman(views=views),
)
Factor Model
factor_prices = load_factors_dataset()
X, y = prices_to_returns(prices, factor_prices)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)
model = MeanRisk(prior_estimator=FactorModel())
model.fit(X_train, y_train)
print(model.weights_)
portfolio = model.predict(X_test)
print(portfolio.calmar_ratio)
print(portfolio.summary())
Factor Model & Covariance Detoning
model = MeanRisk(
prior_estimator=FactorModel(
factor_prior_estimator=EmpiricalPrior(covariance_estimator=DenoteCovariance())
)
)
Black & Litterman Factor Model
factor_views = ["MTUM - QUAL == 0.03 ", "SIZE - TLT == 0.04", "VLUE == 0.06"]
model = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
prior_estimator=FactorModel(
factor_prior_estimator=BlackLitterman(views=factor_views),
),
)
Pre-Selection Pipeline
set_config(transform_output="pandas")
model = Pipeline(
[
("pre_selection", SelectKExtremes(k=10, highest=True)),
("optimization", MeanRisk()),
]
)
model.fit(X_train)
portfolio = model.predict(X_test)
K-fold Cross-Validation
model = MeanRisk()
mmp = cross_val_predict(model, X_test, cv=KFold(n_splits=5))
# mmp is the predicted MultiPeriodPortfolio object composed of 5 Portfolios (1 per testing fold)
mmp.plot_cumulative_returns()
print(mmp.summary()
Combinatorial Purged Cross-Validation
model = MeanRisk()
cv = CombinatorialPurgedCV(n_folds=10, n_test_folds=2)
print(cv.get_summary(X_train))
population = cross_val_predict(model, X_train, cv=cv)
population.plot_distribution(
measure_list=[RatioMeasure.SHARPE_RATIO, RatioMeasure.SORTINO_RATIO]
)
population.plot_cumulative_returns()
print(population.summary())
Citation
If you use scikit-learn in a scientific publication, we would appreciate citations:
Bibtex entry:
@misc{riskfolio, author = {Hugo Delatte}, title = {skfolio}, year = {2023}, url = {https://github.com/skfolio/skfolio}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file skfolio-0.0.3.tar.gz
.
File metadata
- Download URL: skfolio-0.0.3.tar.gz
- Upload date:
- Size: 645.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52409d92255edb6df3cde79f68e1676d92562fe2bf9326ae883f3621695d647a |
|
MD5 | b954db9e7ef89b55ed72941789afffd6 |
|
BLAKE2b-256 | de23e29932d66796d028ab7b9f0353e32dc82f2d0234ee509e8b21f0452b99d5 |
File details
Details for the file skfolio-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: skfolio-0.0.3-py3-none-any.whl
- Upload date:
- Size: 685.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5b5354ab708ea2f3e967a5ba1b4f85ff4107a25b0f33d9b7872444b4c78eb1e |
|
MD5 | c6858af1433fe325bf50a6631ff99a7d |
|
BLAKE2b-256 | a26c3652942086430a2bcd519caccae1441807a3c1f1dab1dee7b4326557e079 |