Skip to main content

Correlation-aware portfolio optimization and analytics for Python.

Project description

Rhiza Logo Basanos

Correlation-aware portfolio optimization and analytics for Python.

Synced with Rhiza License: MIT Python versions CI Code style: ruff uv Last Updated


Basanos computes correlation-adjusted risk positions from price data and expected-return signals. It estimates time-varying EWMA correlations, applies shrinkage towards the identity matrix, and solves a normalized linear system per timestamp to produce stable, scale-invariant positions — implementing a first hurdle for expected returns.

Table of Contents

Features

  • Correlation-Aware Optimization — EWMA correlation estimation with shrinkage towards identity
  • Dynamic Risk Management — Volatility-normalized positions with configurable clipping and variance scaling
  • Portfolio Analytics — Sharpe, VaR, CVaR, drawdown, skew, kurtosis, and more
  • Performance Attribution — Tilt/timing decomposition to isolate allocation vs. selection effects
  • Interactive Visualizations — Plotly dashboards for NAV, drawdown, lead/lag analysis, and correlation heatmaps
  • Polars-Native — Built on Polars DataFrames for high-performance, memory-efficient computation

Installation

pip install basanos

Or with uv:

uv add basanos

Quick Start

Portfolio Optimization

import numpy as np
import polars as pl
from basanos.math import BasanosConfig, BasanosEngine

n_days = 100
dates = pl.date_range(
    pl.date(2023, 1, 1),
    pl.date(2023, 1, 1) + pl.duration(days=n_days - 1),
    eager=True,
)
rng = np.random.default_rng(42)

prices = pl.DataFrame({
    "date": dates,
    "AAPL":  100.0 + np.cumsum(rng.normal(0, 1.0, n_days)),
    "GOOGL": 150.0 + np.cumsum(rng.normal(0, 1.2, n_days)),
})

# Expected-return signals in [-1, 1] (e.g. from a forecasting model)
mu = pl.DataFrame({
    "date": dates,
    "AAPL":  np.tanh(rng.normal(0, 0.5, n_days)),
    "GOOGL": np.tanh(rng.normal(0, 0.5, n_days)),
})

cfg = BasanosConfig(
    vola=16,    # EWMA lookback for volatility (days)
    corr=32,    # EWMA lookback for correlation (days, must be >= vola)
    clip=3.5,   # Clipping threshold for vol-adjusted returns
    shrink=0.5, # Shrinkage intensity towards identity [0, 1]
    aum=1e6,    # Assets under management
)

engine    = BasanosEngine(prices=prices, mu=mu, cfg=cfg)
positions = engine.cash_position  # pl.DataFrame of optimized cash positions
portfolio = engine.portfolio      # Portfolio object for analytics

Portfolio Analytics

import numpy as np
import polars as pl
from basanos.analytics import Portfolio

n_days = 60
dates = pl.date_range(
    pl.date(2023, 1, 1),
    pl.date(2023, 1, 1) + pl.duration(days=n_days - 1),
    eager=True,
)
rng = np.random.default_rng(42)

prices = pl.DataFrame({
    "date": dates,
    "AAPL":  100.0 * np.cumprod(1 + rng.normal(0.001, 0.020, n_days)),
    "GOOGL": 150.0 * np.cumprod(1 + rng.normal(0.001, 0.025, n_days)),
})

positions = pl.DataFrame({
    "date": dates,
    "AAPL":  np.full(n_days, 10_000.0),
    "GOOGL": np.full(n_days, 15_000.0),
})

portfolio = Portfolio.from_cash_position(prices=prices, cash_position=positions, aum=1e6)

# Performance metrics
nav      = portfolio.nav_accumulated   # Cumulative additive NAV
returns  = portfolio.returns           # Daily returns scaled by AUM
drawdown = portfolio.drawdown          # Distance from high-water mark

# Statistics
stats  = portfolio.stats
sharpe = stats.sharpe()["returns"]
vol    = stats.volatility()["returns"]

Visualizations

fig = portfolio.plots.snapshot()                          # NAV + drawdown dashboard
fig = portfolio.plots.lead_lag_ir_plot(start=-10, end=20) # Sharpe across position lags
fig = portfolio.plots.lagged_performance_plot(lags=[0, 1, 2, 3, 4])
fig = portfolio.plots.correlation_heatmap()
# fig.show()

How It Works

The optimizer implements a three-step pipeline per timestamp:

  1. Volatility adjustment — Log returns are normalized by an EWMA volatility estimate and clipped at cfg.clip standard deviations to limit the influence of outliers.

  2. Correlation estimation — An EWMA correlation matrix is computed from the vol-adjusted returns using a lookback of cfg.corr days. The matrix is shrunk toward the identity matrix with retention weight cfg.shrink (λ):

    C_shrunk = λ · C_ewma + (1 − λ) · I
    

    where λ = cfg.shrink. λ = 1.0 uses the raw EWMA matrix; λ = 0.0 replaces it with the identity (treating all assets as uncorrelated). See Shrinkage Methodology below for guidance on choosing λ.

  3. Position solving — For each timestamp, the system C_shrunk · x = mu is solved for x (the risk position vector). The solution is normalized by the inverse-matrix norm of mu, making positions scale-invariant with respect to signal magnitude. Positions are further scaled by a running profit-variance estimate to adapt risk dynamically.

Cash positions are obtained by dividing risk positions by per-asset EWMA volatility.

Shrinkage Methodology

Why shrink?

Sample correlation matrices estimated from T observations of n assets are poorly conditioned when n is large relative to T — the classical curse of dimensionality. The Marchenko–Pastur law shows that extreme eigenvalues of the sample matrix are severely biased (small eigenvalues are deflated, large ones are inflated), making the matrix difficult to invert reliably. Linear shrinkage toward the identity corrects this by pulling all eigenvalues toward a common value, improving the numerical condition of the matrix and reducing out-of-sample estimation error.

Basanos uses convex linear shrinkage (Ledoit & Wolf, 2004):

C_shrunk = λ · C_ewma + (1 − λ) · I_n

This is a special case of the general Ledoit–Wolf framework where the shrinkage target is the identity matrix and the retention weight λ is treated as a user-controlled hyperparameter. Unlike the analytically optimal Ledoit–Wolf or Oracle Approximating Shrinkage (OAS) estimators, Basanos uses a fixed λ — appropriate for regularising a linear solver rather than estimating a covariance matrix, where practical stability often matters more than minimum Frobenius loss.

How to choose cfg.shrink (= λ)

The key quantity is the concentration ratio n / T, where n = number of assets and T = cfg.corr (the EWMA lookback).

Regime n / T ratio Suggested λ Rationale
Many assets, short lookback > 0.5 0.3 – 0.5 High noise; strong regularisation
Moderate assets and lookback 0.1 – 0.5 0.5 – 0.7 Balanced
Few assets, long lookback < 0.1 0.7 – 0.9 Well-conditioned sample; light regularisation

A useful heuristic starting point is λ ≈ 1 − n / (2·T) (where n = number of assets and T = cfg.corr), which approximates the Ledoit–Wolf optimal intensity. Always validate on held-out data.

Sensitivity notes:

  • Below λ ≈ 0.3 the matrix can become nearly singular for small portfolios (e.g., n > 10 with corr < 50), leading to numerically unstable positions.
  • Above λ ≈ 0.8 the off-diagonal correlations are so heavily damped that the optimizer behaves almost as if all assets were independent.
  • Shrinkage is most influential in the range λ ∈ [0.3, 0.8].

Interactive demonstration

The book/marimo/notebooks/shrinkage_guide.py notebook shows the empirical effect of different shrinkage levels on portfolio Sharpe ratio and position stability for a realistic synthetic dataset.

References

  • Ledoit, O., & Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. Journal of Multivariate Analysis, 88(2), 365–411. https://doi.org/10.1016/S0047-259X(03)00096-4
  • Chen, Y., Wiesel, A., Eldar, Y. C., & Hero, A. O. (2010). Shrinkage algorithms for MMSE covariance estimation. IEEE Transactions on Signal Processing, 58(10), 5016–5029. https://doi.org/10.1109/TSP.2010.2053029
  • Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. Proceedings of the Third Berkeley Symposium, 1, 197–206.

Performance Characteristics

TL;DR — the optimizer is practical for ≤ 250 assets with ≤ 10 years of daily data on a 16 GB workstation. Beyond those limits, memory or compute time becomes the bottleneck.

Computational complexity

Let N = number of assets and T = number of timestamps.

Step Complexity Bottleneck
Vol-adjustment (ret_adj, vola) O(T·N) EWMA per asset; scales linearly
EWM correlation (cor) O(T·N²) lfilter over all N² pairs in parallel
Linear solve per row (cash_position) O(N³) × T solves Cholesky/LU decomposition per timestamp

For most practical portfolio sizes (N ≤ 200) the correlation step dominates. At very large N (≥ 500) the per-solve cost O(N³) can also become significant.

Memory usage

_ewm_corr_numpy allocates roughly 14 float64 arrays of shape (T, N, N) simultaneously at peak (input sequences fed to scipy.signal.lfilter, the IIR filter outputs, the five EWM component arrays, and the result tensor):

Peak RAM ≈ 14 × 8 × T × N²  bytes  ≈  112 × T × N²  bytes

Practical working sizes:

N (assets) T (daily rows) Approx. history Peak memory
50 252 ~1 year ~70 MB
100 252 ~1 year ~280 MB
100 1 260 ~5 years ~1.4 GB
100 2 520 ~10 years ~2.8 GB
200 1 260 ~5 years ~5.6 GB
200 2 520 ~10 years ~11 GB
500 2 520 ~10 years ~70 GB ⚠
1 000 2 520 ~10 years ~280 GB ⛔

Practical limits

Zone Condition Guidance
✅ Comfortable N ≤ 150, T ≤ 1 260 (~5 yr daily) Runs on an 8 GB laptop in seconds
⚠ Feasible with care N ≤ 250, T ≤ 2 520 (~10 yr daily) Requires ~11–12 GB RAM; plan for 10–60 s wall time
🔴 Impractical N > 500 or T > 5 000 Peak memory exceeds 16 GB; consider mitigation strategies below
⛔ Not supported N > 1 000 with multi-year history Solve cost and memory are prohibitive on commodity hardware

Note on cfg.corr — this is the EWM lookback window, not the total dataset length. Even if you have 10 years of prices, keeping cfg.corr short (e.g., 63 days) does not reduce the peak memory cost of _ewm_corr_numpy: the function always allocates the full (T, N, N) tensor regardless of the lookback value. To limit memory, reduce the number of rows passed in T itself (e.g., trim old prices) rather than adjusting cfg.corr.

Mitigation strategies

When you hit memory or performance limits:

  1. Reduce the asset universe — keep only the most liquid or relevant assets; pre-filter with univariate signal strength before running the optimizer.
  2. Shorten the price history_ewm_corr_numpy processes every row; trim older data to the minimum needed for the EWM warm-up (cfg.corr rows).
  3. Increase cfg.shrink toward 1.0 — stronger identity shrinkage reduces the sensitivity of the solve to noisy off-diagonal entries, allowing a shorter effective lookback without instability.
  4. Process in rolling windows — run the optimizer on overlapping windows (e.g., 1-year chunks) and stitch results; correlation estimates will differ slightly at window boundaries but memory stays bounded.
  5. Use cor_tensor instead of cor — returns a single (T, N, N) NumPy array rather than a Python dict, avoiding Python object overhead for large T.

Benchmark data

Measured on a GitHub Actions runner (AMD EPYC 7763, 4 vCPUs, Python 3.12):

Dataset cor time cash_position time
5 assets, 252 rows (~1 yr) 1.2 ms 56 ms
5 assets, 1 260 rows (~5 yr) 5.4 ms 222 ms
20 assets, 252 rows (~1 yr) 13.6 ms

See BENCHMARKS.md for full results and regression baselines.

API Reference

basanos.math

from basanos.math import BasanosConfig, BasanosEngine
Class Description
BasanosConfig Immutable configuration (Pydantic model)
BasanosEngine Core optimizer; produces positions and a Portfolio

BasanosEngine properties

Property Returns Description
assets list[str] Numeric asset column names
ret_adj pl.DataFrame Vol-adjusted, clipped log returns
vola pl.DataFrame Per-asset EWMA volatility
cor dict[date, np.ndarray] EWMA correlation matrices keyed by date
cash_position pl.DataFrame Optimized cash positions
portfolio Portfolio Ready-to-use portfolio for analytics

basanos.analytics

from basanos.analytics import Portfolio
Class Description
Portfolio Central data model for P&L, NAV, and attribution
Stats Statistical risk/return metrics
Plots Plotly-based interactive visualizations

Portfolio properties

Property Description
profits Per-asset daily P&L
profit Aggregate daily portfolio profit
nav_accumulated Cumulative additive NAV
nav_compounded Compounded NAV
returns Daily returns scaled by AUM
monthly Monthly compounded returns
highwater Running NAV maximum
drawdown Drawdown from high-water mark
tilt Static allocation (average position)
timing Dynamic timing (deviation from average)
stats Stats instance
plots Plots instance

Stats methods

Method Description
sharpe(periods) Annualized Sharpe ratio
volatility(periods, annualize) Standard deviation of returns
skew() Skewness
kurtosis() Excess kurtosis
value_at_risk(alpha, sigma) Parametric VaR
conditional_value_at_risk(alpha, sigma) Expected shortfall (CVaR)
avg_return() Mean return (zeros excluded)
avg_win() Mean positive return
avg_loss() Mean negative return
best() Maximum single-period return
worst() Minimum single-period return

Configuration Reference

Parameter Type Constraint Description
vola int > 0 EWMA lookback for volatility (days)
corr int >= vola EWMA lookback for correlation (days)
clip float > 0 Clipping threshold for vol-adjusted returns
shrink float [0, 1] Shrinkage intensity — 0 = no shrinkage, 1 = identity
aum float > 0 Assets under management for position scaling
from basanos.math import BasanosConfig

# Conservative — longer lookbacks, stronger shrinkage
conservative = BasanosConfig(vola=32, corr=64, clip=3.0, shrink=0.7, aum=1e6)

# Responsive — shorter lookbacks, lighter shrinkage
responsive   = BasanosConfig(vola=8,  corr=16, clip=4.0, shrink=0.3, aum=1e6)

Development

git clone https://github.com/Jebel-Quant/basanos.git
cd basanos
uv sync
Command Action
make test Run the test suite
make fmt Format and lint with ruff
make typecheck Static type checking
make deptry Audit declared dependencies

Before submitting a PR, ensure all checks pass:

make fmt && make test && make typecheck

License

See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

basanos-0.2.3.tar.gz (332.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

basanos-0.2.3-py3-none-any.whl (42.8 kB view details)

Uploaded Python 3

File details

Details for the file basanos-0.2.3.tar.gz.

File metadata

  • Download URL: basanos-0.2.3.tar.gz
  • Upload date:
  • Size: 332.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for basanos-0.2.3.tar.gz
Algorithm Hash digest
SHA256 85f5fd22d767533cba9cb6bd4df215fda0ad6d3f4f484e9ca4142009278635eb
MD5 63345687b94b3b7b8a8569abd9ec0522
BLAKE2b-256 f65ed20d033f943cd77fbcd5ead10266945a5c316222d1a23033333d983d7a81

See more details on using hashes here.

Provenance

The following attestation bundles were made for basanos-0.2.3.tar.gz:

Publisher: rhiza_release.yml on Jebel-Quant/basanos

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file basanos-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: basanos-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 42.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for basanos-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8e2876200364d24737b4e642cc6212071fed89fd8ad3b4ccc887acfb58e42fed
MD5 79b660c9e9d9e40af9638a5a7b7086bc
BLAKE2b-256 ebb500ef58986ded24a02e7b8e5e7339bca052649848d6c525c65102ccd97c5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for basanos-0.2.3-py3-none-any.whl:

Publisher: rhiza_release.yml on Jebel-Quant/basanos

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page