Skip to main content

Institutional-grade hierarchical portfolio optimization (HRP, HERC, NCO) with data loading, metrics, and backtesting — by Anagatam Technologies

Project description

🌳 Canopy

The Institutional Hierarchical Portfolio Optimization Engine
HRP · HERC · NCO — Three algorithms. One facade. Zero matrix inversions.

Documentation · PyPI · Wiki · Release Notes · Disclaimer

License Build Docs PyPI GitHub Stars

Python Version HRP HERC NCO

Tests Code style: black

Snyk Health Score OpenSSF Scorecard Types: typed


Canopy is an open-source, institutional-grade Python library for hierarchical portfolio allocation. It implements three algorithms — HRP, HERC, and NCO — with four covariance estimators, four risk measures, walk-forward backtesting, and a full compliance audit trail.

One facade. One import. One line to optimal weights.

from canopy.MasterCanopy import MasterCanopy

weights = MasterCanopy(method='HRP', cov_estimator='ledoit_wolf').cluster(returns).allocate()

[!NOTE] Canopy Pro — featuring next-generation hierarchical algorithms (HRCP, HERC-DRL, Spectral NCO, Bayesian HRP), 12+ risk measures, real-time streaming covariance, and enterprise support — is under active development. 📩 Sign up for early access →


Table of Contents


Why Canopy?

What Why it matters
🏗️ Three algorithms, one facade HRP, HERC, NCO — each with distinct risk-return properties. Switch with one parameter.
📐 Four covariance estimators Ledoit-Wolf, Marchenko-Pastur denoising, EWMA, detoning. The covariance is the portfolio.
📊 Four risk measures Variance, CVaR, CDaR, MAD — HERC allocates across clusters using the measure you choose.
🔍 Full audit trail ISO 8601 timestamped. Export as JSON. MiFID II / SEC Rule 15c3-5 compliant traceability.
🧪 Zero matrix inversion HRP never inverts Σ. Stable even when condition number > 10⁸.
Fast HRP: 11ms, HERC: 17ms, NCO: 46ms on 20 assets. Pure NumPy/SciPy.

Quick Start

pip install canopy-optimizer
import yfinance as yf
from canopy.MasterCanopy import MasterCanopy

# Fetch → Optimize → Allocate
data = yf.download(['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'JPM'], start='2020-01-01')
returns = data['Close'].pct_change().dropna()

opt = MasterCanopy(method='HRP', cov_estimator='ledoit_wolf')
weights = opt.cluster(returns).allocate()
print(weights)
AAPL     0.1824
MSFT     0.2016
GOOGL    0.1953
AMZN     0.1892
JPM      0.2315

Examples

📊 DataLoader — Zero-Boilerplate Data Pipeline

from canopy.data import DataLoader
from canopy.MasterCanopy import MasterCanopy

# One-liner: fetch Nifty stocks + benchmark
returns, nifty = DataLoader.yfinance(
    ['RELIANCE.NS', 'TCS.NS', 'HDFCBANK.NS', 'INFY.NS', 'ICICIBANK.NS',
     'SBIN.NS', 'BHARTIARTL.NS', 'KOTAKBANK.NS', 'LT.NS', 'ITC.NS'],
    start='2021-01-01',
    benchmark='^NSEI'   # Auto-fetches Nifty 50
)

opt = MasterCanopy(method='HRP')
weights = opt.cluster(returns).allocate()
print(weights)

🎯 HERC — Tail-Risk-Aware Allocation with CVaR

opt = MasterCanopy(
    method='HERC',
    cov_estimator='denoised',    # Marchenko-Pastur denoising
    risk_measure='cvar',         # CVaR for tail-risk-aware allocation
    detone=True,                 # Remove market mode for better clustering
    min_weight=0.01,             # UCITS-compliant floor
    max_weight=0.10              # UCITS-compliant ceiling
)
weights = opt.cluster(returns).allocate()

🔬 NCO — Full Audit Trail for Compliance

opt = MasterCanopy(
    method='NCO',
    cov_estimator='ledoit_wolf',
    max_k=8,                     # Up to 8 clusters
)
weights = opt.cluster(returns).allocate()

# Institutional-grade audit
print(opt.summary())             # Human-readable report
audit = opt.tojson()             # Machine-readable JSON for compliance
diag = opt.diagnostics()         # Eigenvalue + condition number analysis
print(f"Condition Number: {diag['covariance']['condition_number']:.0f}")

📈 Full Backtest + Performance Metrics

from canopy.data import DataLoader
from canopy.MasterCanopy import MasterCanopy
from canopy.metrics import PortfolioMetrics
from canopy.backtest import BacktestEngine

# Load data
returns, sp500 = DataLoader.yfinance(
    ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'NVDA', 'META',
     'JPM', 'JNJ', 'PG', 'KO'],
    start='2020-01-01',
    benchmark='^GSPC'
)

# Walk-forward backtest with monthly rebalancing
engine = BacktestEngine(
    optimizer=MasterCanopy(method='HRP', cov_estimator='ledoit_wolf'),
    frequency='monthly',
    lookback=252,
)
result = engine.run(returns)
print(result.summary())

# Performance analytics
opt = MasterCanopy(method='HRP', cov_estimator='ledoit_wolf')
weights = opt.cluster(returns).allocate()

pm = PortfolioMetrics(returns, weights, benchmark=sp500)
print(f"Sharpe Ratio      : {pm.sharpe():.3f}")
print(f"Sortino Ratio     : {pm.sortino():.3f}")
print(f"Max Drawdown      : {pm.maxdrawdown():.2%}")
print(f"CVaR (5%)         : {pm.cvar():.4f}")
print(f"Information Ratio : {pm.informationratio():.3f}")
print(pm.report())              # Full formatted report

Algorithms

Canopy implements three hierarchical allocation algorithms. Each solves the portfolio construction problem differently:

Allocation Comparison

Algorithm Method Key Property Speed
HRP Recursive bisection under inverse-variance risk parity. No Σ⁻¹ required. Maximum stability 11 ms
HERC Two-stage: inter-cluster risk parity + intra-cluster inverse-variance. 4 risk measures. Cluster-aware diversification 17 ms
NCO Tikhonov-regularized nested optimization: (Σ_k + λI)⁻¹ · 1 Lowest tail risk 46 ms

Cumulative Returns: India — Canopy vs Nifty 50

India Cumulative Returns

Cumulative Returns: US — Canopy vs S&P 500

US Cumulative Returns


Covariance Estimators

The covariance matrix is the single most important input. Canopy provides four institutional-grade estimators:

Estimator Formula When to Use
Sample Σ̂ = (1/T)·Rᵀ·R Baseline. T/N ratio > 10×
Ledoit-Wolf Σ_LW = α·F + (1−α)·Σ̂ Default. Reduces estimation error ~40%
Denoised Marchenko-Pastur RMT: clip eigenvalues below λ₊ = σ²(1+√(N/T))² High-noise. N/T > 0.5
EWMA Σ_t = λ·Σ_{t-1} + (1−λ)·rₜ·rₜᵀ Regime-adaptive

Detoning (Lopez de Prado, 2020): Removes the market mode (first eigenvector) before clustering for more discriminative sector-level grouping.


Risk Measures & Portfolio Modes

HERC Inter-Cluster Risk Allocation

Measure Use Case
Variance Classic Raffinot (2017). Symmetric risk budgeting.
CVaR Tail risk. Allocates away from crash-prone clusters.
CDaR Drawdown risk. Penalizes deep underwater periods.
MAD Robust to outliers. No squared deviations.

Portfolio Modes

Mode Constraint Use Case
long_only wᵢ ≥ 0 Mutual funds, ETFs, pensions, UCITS
long_short Σwᵢ = 1 Hedge funds, 130/30 strategies
market_neutral Σwᵢ = 0 Statistical arbitrage

Dendrogram & Cluster Analysis

Canopy builds a full hierarchical clustering tree using 7 linkage methods (Ward, Single, Complete, Average, Weighted, Centroid, Median) with optional optimal leaf ordering (Bar-Joseph et al., 2001):

Dendrogram

The dendrogram reveals the correlation structure of the asset universe. Strongly correlated assets cluster together at low distances, while uncorrelated assets are separated at higher distances.

Risk Decomposition

Canopy decomposes portfolio risk to show each asset's marginal contribution to total variance:

Risk Contribution

Equal Risk Contribution (the gold dashed line at 5% for N=20) is the theoretical target. HRP with denoised covariance achieves near-equal risk contribution without any explicit optimization constraint.


📦 New in v3.0

DataLoader

Zero-boilerplate data pipeline. Fetch from Yahoo Finance, CSV, Parquet, or DataFrame.

from canopy.data import DataLoader

returns, nifty = DataLoader.yfinance(
    ['RELIANCE.NS', 'TCS.NS', 'HDFCBANK.NS', 'INFY.NS'],
    start='2021-01-01',
    benchmark='^NSEI'
)

returns = DataLoader.csv('prices.csv')
returns = DataLoader.parquet('bloomberg.parquet')

PortfolioMetrics

Math separated from logic. Pure functions + comprehensive reporting class.

from canopy.metrics import PortfolioMetrics

pm = PortfolioMetrics(returns, weights, benchmark=nifty)
print(pm.sharpe())           # Annualized Sharpe Ratio
print(pm.maxdrawdown())      # Maximum peak-to-trough decline
print(pm.cvar())             # Conditional Value-at-Risk
print(pm.report())           # Full formatted report

BacktestEngine

Walk-forward rebalancing. Daily, weekly, monthly, quarterly, annual frequencies.

from canopy.backtest import BacktestEngine, MasterCanopy

engine = BacktestEngine(
    optimizer=MasterCanopy(method='HRP', cov_estimator='ledoit_wolf'),
    frequency='monthly',
    lookback=252,
)
result = engine.run(returns)
print(result.summary())

📖 Full API Reference on ReadTheDocs →


Performance Benchmarks

Benchmarked on 20 global assets (US + India), 5 years of daily data (2020–2025):

Method Cov Estimator Sharpe Sortino CVaR 95% Max DD Speed
HRP Denoised 0.83 0.95 -2.27% -30.5% 11 ms
HRP Ledoit-Wolf 0.79 0.91 -2.29% -31.0% 11 ms
HERC LW + CVaR 0.70 0.81 -2.35% -31.8% 17 ms
NCO Ledoit-Wolf 0.68 0.79 -2.19% -23.2% 46 ms

Feature Matrix

Feature Status
HRP, HERC, NCO Allocation
4 Covariance Estimators (Sample, Ledoit-Wolf, Denoised, EWMA)
4 Risk Measures (Variance, CVaR, CDaR, MAD)
Correlation Matrix Detoning
Weight Constraints (min/max)
3 Portfolio Modes
DataLoader (yfinance, CSV, Parquet)
PortfolioMetrics (Sharpe, Sortino, CVaR, Calmar, IR)
Walk-Forward BacktestEngine
ISO 8601 Audit Trail + JSON Export
9 Interactive Plotly Charts
7 Linkage Methods + Optimal Leaf Ordering
Block Bootstrap Confidence Intervals
29 Tests Passing (0.84s)

Architecture

canopy/
├── MasterCanopy.py              ← Facade (v3.0.0)
├── core/
│   ├── CovarianceEngine.py      ← Ledoit-Wolf, Denoised, EWMA, Detoning
│   └── ClusterEngine.py         ← 7 Linkage Methods, 4 Distance Metrics
├── optimizers/
│   ├── HRP.py                   ← Vectorized Recursive Bisection
│   ├── HERC.py                  ← 4 Risk Measures (Var, CVaR, CDaR, MAD)
│   └── NCO.py                   ← Tikhonov-Regularized Nested Optimization
├── data/
│   └── loader.py                ← DataLoader (.yfinance, .csv, .parquet)
├── metrics/
│   └── performance.py           ← Sharpe, Sortino, MaxDD, CVaR, Calmar, IR
├── backtest/
│   └── engine.py                ← Walk-Forward BacktestEngine
├── viz/ChartEngine.py           ← 9 Interactive Plotly Charts
├── tests/test_canopy.py         ← 29 Tests (all passing)
└── docs/                        ← Sphinx + ReadTheDocs

Design Principles

  1. Fail fast, fail loud. Inputs validated at construction time, not compute time.
  2. Zero matrix inversion for HRP. Stable even for near-singular covariance matrices.
  3. Audit everything. Every step timestamped. Export JSON for compliance.
  4. Modular kernel. core/ (math), optimizers/ (allocation), viz/ (charts), data/ (loading), metrics/ (analytics), backtest/ (simulation).
  5. Fluent API. opt.cluster(returns).allocate() — one chain, readable, Pythonic.

Installation

pip install canopy-optimizer

With data loading:

pip install canopy-optimizer[data]

From source:

git clone https://github.com/Anagatam/Canopy.git
cd Canopy && pip install -e .[dev]

Requirements: Python ≥ 3.10 · NumPy · Pandas · SciPy · scikit-learn · Plotly · NetworkX


Testing

pytest tests/test_canopy.py -v          # Run tests
pytest tests/test_canopy.py -v --cov    # With coverage

29/29 tests passing in 0.84 seconds.


📚 Documentation

Resource Link
ReadTheDocs canopy-institutional-hierarchical-optimization-engine.readthedocs.io
PyPI pypi.org/project/canopy-optimizer
GitHub Wiki github.com/Anagatam/Canopy/wiki
API Reference docs/api_reference.md
Algorithms docs/algorithms.md
Linkage Methods docs/linkage_methods.md
Diagnostics docs/diagnostics.md

🔮 Canopy Pro

Canopy Pro is our advanced premium engine designed for institutional portfolio managers, financial analysts, and investment advisors who need cutting-edge capabilities beyond the open-source edition. It builds on Canopy's proven foundation with next-generation algorithms, expanded risk analytics, and enterprise-grade integrations.

🧬 Next-Generation Algorithms

Algorithm What It Does
HRCP (Hierarchical Risk Contribution Parity) Achieves exact risk budgets (< 0.01% tolerance) through iterative scaling within the hierarchical tree. Designed for Basel III/IV risk parity mandates.
HERC-DRL (Deep Reinforcement Learning) A policy gradient agent dynamically reweights clusters based on rolling covariance features. Trained on 20+ years of crisis data (GFC, COVID, SVB).
Spectral NCO Combines spectral graph theory with persistent homology (topological data analysis) to capture higher-order asset dependencies beyond pairwise correlations.
Bayesian HRP Integrates Black-Litterman posterior returns into the hierarchical tree, enabling view-consistent allocation for portfolio managers with conviction ideas.

📐 Advanced Covariance

Estimator What It Delivers
DCC-GARCH Time-varying correlations that capture regime shifts during market stress — critical for VaR models under Basel III.
Factor Models (Barra-style) Handles universes of 500+ assets via factor-based decomposition (style + industry), reducing dimensionality from N² to K².
Realized Kernels Reconstructs covariance from intraday tick data with microstructure noise removal. Purpose-built for HFT and intraday rebalancing.

📊 12+ Risk Measures

Measure What It Captures
EVaR (Entropic VaR) Coherent + convex. Tighter tail bound than CVaR. The preferred measure for robust optimization.
RLVaR (Relativistic VaR) Handles heavy-tailed (non-Gaussian) return distributions. Based on Kaniadakis entropy.
EDaR (Entropic Drawdown-at-Risk) Drawdown-aware tail risk for multi-horizon institutional portfolios.
Tail Gini Measures inequality in the return tail — a more nuanced view of concentration risk than VaR alone.
Omega Ratio Uses the full return distribution, not just the left tail, for a complete risk-return picture.

🔌 Enterprise Integration

Integration Use Case
Bloomberg B-PIPE / SAPI Direct market data ingestion from Bloomberg Terminal
Refinitiv Eikon Alternative data source for firms on Refinitiv
MOSEK Solver Convex optimization backend for constrained Pro algorithms
Real-Time Streaming WebSocket-based covariance updates for intraday rebalancing

Feature Comparison

Capability Canopy (Open Source) Canopy Pro
Algorithms 3 (HRP, HERC, NCO) 7+ (HRCP, HERC-DRL, Spectral NCO, Bayesian HRP)
Covariance Estimators 4 8+ (DCC-GARCH, Factor Models, Realized Kernels)
Risk Measures 4 12+ (EVaR, RLVaR, EDaR, Tail Gini, Omega)
Portfolio Modes 3 6+ (Risk Budgeting, Black-Litterman, Regime-Aware)
Data Sources yfinance, CSV Bloomberg, Refinitiv, streaming
Backtesting Walk-forward Walk-forward + Monte Carlo + Stress Testing
Support Community Priority SLA + Dedicated Engineering

Interested in Canopy Pro? 📩 Sign up for early access →

Built specifically for institutional portfolio managers, financial analysts, and investment advisors.


⚖️ License & Disclaimer

Apache License 2.0 — Copyright © 2026 Anagatam Technologies. All rights reserved.

[!CAUTION] Not investment advice. Canopy is a mathematical software library for educational and research purposes only. It does not provide financial recommendations or trading signals. Consult a licensed financial professional before making investment decisions. See DISCLAIMER.md for SEC, SEBI, and global regulatory compliance.


Built with precision for the institutional quantitative finance community.

📖 Docs · 📦 PyPI · 📚 Wiki · 📰 Wikipedia · 🐛 Issues · ⚖️ Disclaimer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

canopy_optimizer-3.0.2.tar.gz (64.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

canopy_optimizer-3.0.2-py3-none-any.whl (62.2 kB view details)

Uploaded Python 3

File details

Details for the file canopy_optimizer-3.0.2.tar.gz.

File metadata

  • Download URL: canopy_optimizer-3.0.2.tar.gz
  • Upload date:
  • Size: 64.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for canopy_optimizer-3.0.2.tar.gz
Algorithm Hash digest
SHA256 66c6860a98beba5543e74eabada3b8d8e0cdae770c2b5b9199387235b3b01d03
MD5 e6b8efa497d2017cbd333753a1e71f54
BLAKE2b-256 f97ba8d3c6f86f1185f7a765910289cad8fc1155622a7d5e437c69cd20aad689

See more details on using hashes here.

File details

Details for the file canopy_optimizer-3.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for canopy_optimizer-3.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 31f6c130ed88599e97859d8e6614a4b708d26a24bbf303675ed8f776704715e0
MD5 40539f3b5120da0336d5ec33eb36c708
BLAKE2b-256 d05986660ff0db05f6b9288db9ccacca57b44f83674cc12c7d51ea24ee333832

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page