Skip to main content

Sparse factor model estimation with sign-constrained LASSO, prior-centered regularisation, and hierarchical group LASSO (HCGL)

Project description

factorlasso

Sparse factor model estimation with sign-constrained LASSO, prior-centered regularisation, and hierarchical group LASSO (HCGL)

CI PyPI Python Coverage License: MIT


Overview

factorlasso solves the sparse multi-output regression problem

$$Y_t = \alpha + \beta X_t + \varepsilon_t$$

where $\beta$ is $(N \times M)$, $\alpha$ is $(N \times 1)$ intercept, $Y_t$ is $(N \times 1)$, and $X_t$ is $(M \times 1)$, under:

  • Sign constraints on individual coefficients (non-negative, non-positive, zero, or free)
  • Prior-centered regularisation — penalise $|\beta - \beta_0|$ instead of $|\beta|$, shrinking toward domain-specific priors
  • Group structure — Group LASSO with user-defined groups or automatic hierarchical clustering (HCGL)
  • EWMA-weighted observations — exponential decay for non-stationary data
  • NaN-aware estimation — validity masking handles variables with different observation lengths

After estimation, factorlasso assembles the consistent factor covariance decomposition

$$\Sigma_y = \beta,\Sigma_x,\beta^\top + D$$

where $\Sigma_x$ is the factor covariance and $D$ is diagonal idiosyncratic variance.

No existing Python package combines sign-constrained penalised regression with prior-centered shrinkage and integrated factor covariance assembly.

Installation

pip install factorlasso

Quick Start

import numpy as np, pandas as pd
from factorlasso import LassoModel, LassoModelType

# Simulate: Y = X @ beta_true.T + noise
np.random.seed(42)
T, M, N = 200, 3, 5
X = pd.DataFrame(np.random.randn(T, M), columns=['f0', 'f1', 'f2'])
beta_true = np.array([[1, 0, .5], [0, 1, 0], [.3, 0, 0], [0, .8, .2], [1, .5, 0]])
Y = pd.DataFrame(X.values @ beta_true.T + .1*np.random.randn(T, N),
                  columns=[f'y{i}' for i in range(N)])

model = LassoModel(model_type=LassoModelType.LASSO, reg_lambda=1e-4)
model.fit(x=X, y=Y)
print(model.coef_.round(2))       # β (N × M)
print(model.intercept_.round(4))  # α (N,)

Predict and Score (scikit-learn compatible)

y_hat = model.predict(X)  # Ŷ = α + X β'
r2 = model.score(X, Y)    # mean R² across response variables

Sign Constraints

# 1 = non-negative, -1 = non-positive, 0 = zero, NaN = free
signs = pd.DataFrame([[1, np.nan, 1], [np.nan, 1, 0], [1, 0, np.nan],
                       [np.nan, 1, 1], [1, 1, np.nan]],
                      index=Y.columns, columns=X.columns)

model = LassoModel(reg_lambda=1e-4, factors_beta_loading_signs=signs)
model.fit(x=X, y=Y)

Prior-Centered Regularisation

beta_prior = pd.DataFrame(beta_true, index=Y.columns, columns=X.columns)
model = LassoModel(reg_lambda=1e-2, factors_beta_prior=beta_prior)
model.fit(x=X, y=Y)  # shrinks toward beta_prior instead of zero

Hierarchical Clustering Group LASSO (HCGL)

model = LassoModel(
    model_type=LassoModelType.GROUP_LASSO_CLUSTERS,
    reg_lambda=1e-5, span=52,
)
model.fit(x=X, y=Y)
print(model.clusters)  # auto-discovered groups

Factor Covariance Assembly

from factorlasso import CurrentFactorCovarData, VarianceColumns
from factorlasso.ewm_utils import compute_ewm_covar

# Assemble Sigma_y = beta @ Sigma_x @ beta.T + D
sigma_y = CurrentFactorCovarData(
    x_covar=factor_covariance,
    y_betas=model.coef_,
    y_variances=diagnostics_df,
).get_y_covar()

API Summary

The API follows scikit-learn conventions: fit / predict / score.

Method Description
model.fit(x, y) Estimate α, β — returns self
model.predict(x) Return Ŷ = α + X β'
model.score(x, y) Return mean R²
Fitted attribute Shape Description
coef_ (N, M) Factor loadings β
intercept_ (N,) Intercept α
estimated_betas (N, M) Alias for coef_ (backward compat)
clusters_ (N,) HCGL cluster labels

Estimation Methods

Method LassoModelType Penalty
LASSO LASSO $\lambda|\beta - \beta_0|_1$
Group LASSO GROUP_LASSO $\sum_g \lambda\sqrt{
HCGL GROUP_LASSO_CLUSTERS Same as Group LASSO with auto-clustering

All methods support sign constraints, prior-centered shrinkage, EWMA weighting, and NaN-aware estimation.

Applications

The methodology is domain-agnostic. Examples are provided for:

The same estimation problem (sparse factor loadings with sign priors and consistent covariance) appears in macro-econometrics, signal processing, and multi-task learning.

Dependencies

Only standard scientific Python:

  • numpy ≥ 1.22
  • pandas ≥ 1.4
  • scipy ≥ 1.9
  • cvxpy ≥ 1.3

Related Packages

Package Key Difference
scikit-learn Lasso No sign constraints, no multi-output Group LASSO
skglm No sign constraints, no prior-centered shrinkage
abess Best-subset selection (L0), not L1/Group L2
group-lasso No sign constraints, no EWMA, no prior-centered

factorlasso is the only package that combines sign-constrained penalised regression, prior-centered shrinkage, HCGL clustering, and integrated factor covariance assembly.

References

Sepp A., Ossa I., Kastenholz M. (2026), "Robust Optimization of Strategic and Tactical Asset Allocation for Multi-Asset Portfolios", The Journal of Portfolio Management, 52(4), 86–120. Paper link

Citation

@software{sepp2026factorlasso,
  author = {Sepp, Artur},
  title = {factorlasso: Sparse Factor Model Estimation with Constrained LASSO in Python},
  year = {2026},
  url = {https://github.com/ArturSepp/factorlasso}
}

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factorlasso-0.1.0.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

factorlasso-0.1.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file factorlasso-0.1.0.tar.gz.

File metadata

  • Download URL: factorlasso-0.1.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for factorlasso-0.1.0.tar.gz
Algorithm Hash digest
SHA256 68848d3ec2e30c0c8a07bd85cd26f589236df724636ec8f66bc4ad0acdd72841
MD5 be0f6249c1817920b2f6614b7903071e
BLAKE2b-256 f42814620224b18c838be86913f6e0a437e2bf76dadcb3234669af07d78e879a

See more details on using hashes here.

File details

Details for the file factorlasso-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: factorlasso-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for factorlasso-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ef0f94bb54297098924b8b5404a9a571bf8db107e1968aba54b87997c46cc2d7
MD5 4fdb93e0d2c9d2a533962da0a0e5dac0
BLAKE2b-256 745f8e5106874aa2173a92dcf5840ec2775f99ebab804f940a98cfe87c8c630a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page