Sparse factor model estimation with sign-constrained LASSO, prior-centered regularisation, and hierarchical group LASSO (HCGL)

These details have not been verified by PyPI

Project links

Project description

🚀 Sparse Factor Model Estimation: factorlasso

factorlasso package implements sign-constrained LASSO, prior-centered regularisation, and hierarchical group LASSO (HCGL) for sparse multi-output factor model estimation with integrated factor covariance assembly

📊 Metric	🔢 Value
PyPI Version
Python Versions
License

📈 Package Statistics

📊 Metric	🔢 Value
Total Downloads
CI Status
Coverage
GitHub Stars
GitHub Forks

The Problem

In many applications — portfolio construction, genomics, macro-econometrics — you need to estimate a factor model

$$Y_t = \alpha + \beta X_t + \varepsilon_t$$

where $Y_t \in \mathbb{R}^{N}$ are response variables (asset returns, gene expressions), $X_t \in \mathbb{R}^{M}$ are factors, $\beta \in \mathbb{R}^{N \times M}$ are sparse factor loadings, and $\alpha \in \mathbb{R}^{N}$ is the intercept.

In practice, you face several challenges that standard LASSO packages don't handle:

Domain knowledge constrains coefficient signs — equity assets should have non-negative equity beta; government bonds should not load on commodity factors. Standard LASSO ignores this.
You have prior estimates and want to shrink toward them, not toward zero — the penalty should be $|\beta - \beta_0|$ not $|\beta|$.
Variables have different history lengths — some assets start trading later than others. Dropping rows with any NaN discards valid data for all other variables.
You need a consistent covariance matrix — the factor covariance $\Sigma_y = \beta \Sigma_x \beta^\top + D$ must use the same $\beta$ from estimation, not a separate estimate.
Data is non-stationary — recent observations should carry more weight (EWMA weighting).

factorlasso solves all five in a single fit() call. The implementation follows scikit-learn conventions (fit / predict / score / coef_ / intercept_).

The methodology is based on the Hierarchical Clustering Group LASSO (HCGL) framework introduced in:

Sepp A., Ossa I., Kastenholz M. (2026), "Robust Optimization of Strategic and Tactical Asset Allocation for Multi-Asset Portfolios", The Journal of Portfolio Management, 52(4), 86–120. Paper link

and the Capital Market Assumptions framework in the companion paper:

Sepp A., Hansen E., Kastenholz M. (2026), "Capital Market Assumptions and Strategic Asset Allocation Using Multi-Asset Tradable Factors", Under revision at the Journal of Portfolio Management.

Installation

Install using

pip install factorlasso

Upgrade using

pip install --upgrade factorlasso

Clone using

git clone https://github.com/ArturSepp/factorlasso.git

Core dependencies: numpy, pandas, scipy, cvxpy, openpyxl

Quick Start
Convention: Paper vs Code
Sign Constraints
Prior-Centered Regularisation
Hierarchical Clustering Group LASSO (HCGL)
NaN-Aware Estimation
Factor Covariance Assembly
API Summary
Estimation Methods
Applications
Related Packages
References
Citation

Quick Start

import numpy as np, pandas as pd
from factorlasso import LassoModel, LassoModelType

# Simulate Y_t = β X_t + noise  (code uses row-major: Y = X @ β' + noise)
np.random.seed(42)
T, M, N = 200, 3, 5
X = pd.DataFrame(np.random.randn(T, M), columns=['f0', 'f1', 'f2'])
beta_true = np.array([[1, 0, .5], [0, 1, 0], [.3, 0, 0], [0, .8, .2], [1, .5, 0]])
Y = pd.DataFrame(X.values @ beta_true.T + .1*np.random.randn(T, N),
                  columns=[f'y{i}' for i in range(N)])

# Fit sparse factor model
model = LassoModel(model_type=LassoModelType.LASSO, reg_lambda=1e-4)
model.fit(x=X, y=Y)
print(model.coef_.round(2))       # β (N × M)
print(model.intercept_.round(4))  # α (N,)

# Predict and score (scikit-learn compatible)
y_hat = model.predict(X)  # Ŷ_t = α + β X_t  (code: X @ β' + α)
r2 = model.score(X, Y)    # mean R² across response variables

Convention: Paper vs Code

The factor model in the paper uses column vectors:

$$Y_t = \alpha + \beta, X_t + \varepsilon_t, \qquad \beta \in \mathbb{R}^{N \times M}$$

where $Y_t \in \mathbb{R}^{N \times 1}$ and $X_t \in \mathbb{R}^{M \times 1}$.

In Python, pandas DataFrames store observations as rows. The code works with the row-major equivalent:

Symbol	Paper (column-vector)	Code (row-major, pandas)
$Y$	$(N \times T)$	`y`: DataFrame $(T \times N)$
$X$	$(M \times T)$	`x`: DataFrame $(T \times M)$
$\beta$	$(N \times M)$	`coef_`: DataFrame $(N \times M)$ — same as paper
$\alpha$	$(N \times 1)$	`intercept_`: Series $(N,)$

The coefficient matrix coef_ is stored in the paper convention $(N \times M)$. The prediction Y = X @ β' + α in code is the row-major form of the paper's Y_t = α + β X_t.

Sign Constraints

Enforce domain knowledge on coefficient signs using a constraint matrix where 1 = non-negative, -1 = non-positive, 0 = constrained to zero, NaN = free:

signs = pd.DataFrame([[1, np.nan, 1], [np.nan, 1, 0], [1, 0, np.nan],
                       [np.nan, 1, 1], [1, 1, np.nan]],
                      index=Y.columns, columns=X.columns)

model = LassoModel(reg_lambda=1e-4, factors_beta_loading_signs=signs)
model.fit(x=X, y=Y)
# All constrained coefficients satisfy their sign requirements by construction

Prior-Centred Regularisation

Shrink toward a non-zero prior instead of zero. When you have prior estimates $\beta_0$ (e.g., from a previous estimation period or theoretical model), the penalty becomes $|\beta - \beta_0|$ instead of $|\beta|$:

beta_prior = pd.DataFrame(beta_true, index=Y.columns, columns=X.columns)
model = LassoModel(reg_lambda=1e-2, factors_beta_prior=beta_prior)
model.fit(x=X, y=Y)  # shrinks toward beta_prior instead of zero

Hierarchical Clustering Group LASSO (HCGL)

Automatically discover group structure among response variables via hierarchical clustering on their correlation matrix (Ward's method), then apply Group LASSO with group-adaptive penalties:

model = LassoModel(
    model_type=LassoModelType.GROUP_LASSO_CLUSTERS,
    reg_lambda=1e-5, span=52,
)
model.fit(x=X, y=Y)
print(model.clusters)  # auto-discovered groups

NaN-Aware Estimation

Variables with different history lengths are handled naturally. Instead of dropping any row containing a NaN (which discards valid observations for all other variables), factorlasso applies a binary validity mask that zeros out the contribution of missing observations per variable while preserving all available data:

Y_with_gaps = Y.copy()
Y_with_gaps.iloc[:50, 3] = np.nan   # variable y3 starts 50 periods later
Y_with_gaps.iloc[:100, 4] = np.nan  # variable y4 starts 100 periods later

model = LassoModel(reg_lambda=1e-4)
model.fit(x=X, y=Y_with_gaps)
# All 5 variables estimated using their full available history
# No data discarded for y0, y1, y2 despite gaps in y3, y4

Factor Covariance Assembly

After estimation, assemble the consistent factor covariance decomposition $\Sigma_y = \beta \Sigma_x \beta^\top + D$ where $\beta$ is the same matrix from the LASSO estimation — guaranteed consistency:

from factorlasso import CurrentFactorCovarData, VarianceColumns

sigma_y = CurrentFactorCovarData(
    x_covar=factor_covariance,   # Σ_x (M × M)
    y_betas=model.coef_,          # β (N × M) from estimation
    y_variances=diagnostics_df,   # residual variances D
).get_y_covar()
# sigma_y is (N × N) positive semi-definite by construction

API Summary

The API follows scikit-learn conventions: fit / predict / score.

Method	Description
`model.fit(x, y)`	Estimate α, β — returns `self`
`model.predict(x)`	Return Ŷ_t = α + β X_t (row-major: `X @ β' + α`)
`model.score(x, y)`	Return mean R²

Fitted attribute	Shape	Description
`coef_`	(N, M)	Factor loadings β
`intercept_`	(N,)	Intercept α
`estimated_betas`	(N, M)	Alias for `coef_` (backward compat)
`clusters_`	(N,)	HCGL cluster labels
`estimation_result_`	—	Full diagnostics (r2, ss_res, ss_total)

Parameter	Type	Default	Description
`model_type`	`LassoModelType`	`LASSO`	Estimation method
`reg_lambda`	`float`	`1e-5`	Regularisation strength
`span`	`int`	`None`	EWMA span for observation weighting
`factors_beta_loading_signs`	`DataFrame`	`None`	Sign constraint matrix (N × M)
`factors_beta_prior`	`DataFrame`	`None`	Prior β₀ matrix (N × M)
`group_data`	`Series`	`None`	Group labels (required for `GROUP_LASSO`)
`demean`	`bool`	`True`	Subtract (rolling) mean before estimation
`solver`	`str`	`'CLARABEL'`	CVXPY solver name
`warmup_period`	`int`	`12`	Min observations before including a variable

Estimation Methods

Method	`LassoModelType`	Penalty
LASSO	`LASSO`	$\lambda\|\beta - \beta_0\|_1$
Group LASSO	`GROUP_LASSO`	$\sum_g \lambda\sqrt{
HCGL	`GROUP_LASSO_CLUSTERS`	Same as Group LASSO with auto-clustering

All methods support sign constraints, prior-centered shrinkage, EWMA weighting, and NaN-aware estimation.

Applications

The methodology is domain-agnostic. Examples are provided for:

examples/finance_factor_model.py — Multi-asset factor models with sign-constrained betas and consistent covariance estimation
examples/genomics_factor_model.py — Gene expression driven by pathway activity factors with biological sign priors

The same estimation problem (sparse factor loadings with sign priors and consistent covariance) appears in macro-econometrics, signal processing, and multi-task learning.

Illustration: multi-asset factor model with HCGL

from factorlasso import LassoModel, LassoModelType

model = LassoModel(
    model_type=LassoModelType.GROUP_LASSO_CLUSTERS,
    reg_lambda=1e-5,
    span=52,                                 # 1-year EWMA half-life (weekly data)
    factors_beta_loading_signs=sign_matrix,   # domain-knowledge constraints
    factors_beta_prior=prior_betas,           # shrink toward prior, not zero
)
model.fit(x=factor_returns, y=asset_returns)

# Inspect results
print(model.coef_)           # sparse factor loadings (N × M)
print(model.intercept_)      # intercept α (N,)
print(model.clusters_)       # auto-discovered asset groups
print(model.score(factor_returns, asset_returns))  # mean R²

Related Packages

Package	Key Difference
scikit-learn `Lasso`	No sign constraints, no multi-output Group LASSO
skglm	No sign constraints, no prior-centered shrinkage
abess	Best-subset selection (L0), not L1/Group L2
group-lasso	No sign constraints, no EWMA, no prior-centered

factorlasso is the only package that combines sign-constrained penalised regression, prior-centered shrinkage, HCGL clustering, NaN-aware estimation, and integrated factor covariance assembly.

References

Sepp A., Ossa I., Kastenholz M. (2026), "Robust Optimization of Strategic and Tactical Asset Allocation for Multi-Asset Portfolios", The Journal of Portfolio Management, 52(4), 86–120. Paper link
Sepp A., Hansen E., Kastenholz M. (2026), "Capital Market Assumptions and Strategic Asset Allocation Using Multi-Asset Tradable Factors", Under revision at the Journal of Portfolio Management.

Citation

If you use factorlasso in your research, please cite the software and the underlying papers:

@software{sepp2026factorlasso,
  author = {Sepp, Artur},
  title = {factorlasso: Sparse Factor Model Estimation with Constrained LASSO in Python},
  year = {2026},
  url = {https://github.com/ArturSepp/factorlasso}
}

@article{seppossa2026,
  author = {Sepp, Artur and Ossa, Ivan and Kastenholz, Mika},
  title = {Robust Optimization of Strategic and Tactical Asset Allocation for Multi-Asset Portfolios},
  journal = {The Journal of Portfolio Management},
  volume = {52},
  number = {4},
  pages = {86--120},
  year = {2026}
}

@article{sepphansen2026,
  author = {Sepp, Artur and Hansen, Emilie and Kastenholz, Mika},
  title = {Capital Market Assumptions and Strategic Asset Allocation Using Multi-Asset Tradable Factors},
  journal = {Under revision at the Journal of Portfolio Management},
  year = {2026}
}

Disclaimer

factorlasso package is distributed FREE & WITHOUT ANY WARRANTY under the MIT License.

See LICENSE for details.

Please report any bugs or suggestions by opening an issue.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.8

May 22, 2026

0.3.7

May 22, 2026

0.3.6

May 22, 2026

0.3.5

May 22, 2026

0.3.4

May 22, 2026

0.3.3

Apr 25, 2026

0.3.2

Apr 21, 2026

0.3.1

Apr 20, 2026

0.3.0

Apr 19, 2026

0.2.2

Apr 18, 2026

0.2.1

Apr 18, 2026

0.1.12

Apr 17, 2026

0.1.11

Apr 16, 2026

0.1.10

Apr 16, 2026

This version

0.1.9

Apr 14, 2026

0.1.8

Apr 14, 2026

0.1.7

Apr 2, 2026

0.1.6

Apr 2, 2026

0.1.5

Mar 24, 2026

0.1.3

Mar 22, 2026

0.1.2

Mar 22, 2026

0.1.0

Mar 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factorlasso-0.1.9.tar.gz (46.9 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

factorlasso-0.1.9-py3-none-any.whl (26.9 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file factorlasso-0.1.9.tar.gz.

File metadata

Download URL: factorlasso-0.1.9.tar.gz
Upload date: Apr 14, 2026
Size: 46.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for factorlasso-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`2e20d3c17a48d9a4a77b48784b2141b20929e1f70872679a8b5aa22e60701f3d`
MD5	`99d45d5fd1c4e75c5a262b2904ddcc33`
BLAKE2b-256	`eadcdedb47d0e87e021828cf3a08b57f9874f2856f23174b416173cf32974727`

See more details on using hashes here.

File details

Details for the file factorlasso-0.1.9-py3-none-any.whl.

File metadata

Download URL: factorlasso-0.1.9-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 26.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for factorlasso-0.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6b519a88a7273f46007403f45915c885f2ff3b96c7c136b8a2b0ae433481372a`
MD5	`cfc37bbd22aeef8775f03e180eb515f2`
BLAKE2b-256	`69106252253e3e851be436c4b0ab4f10012a67258a6544cdb6b59cde67f07a05`

See more details on using hashes here.

factorlasso 0.1.9

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

🚀 Sparse Factor Model Estimation: factorlasso

📈 Package Statistics

The Problem

Installation

Table of Contents

Quick Start

Convention: Paper vs Code

Sign Constraints

Prior-Centred Regularisation

Hierarchical Clustering Group LASSO (HCGL)

NaN-Aware Estimation

Factor Covariance Assembly

API Summary

Estimation Methods

Applications

Illustration: multi-asset factor model with HCGL

Related Packages

References

Citation

Disclaimer

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes