Skip to main content

Highly Adaptive Principal Components

Project description

HAPC: Highly Adaptive Prinicipal Components

A fast and flexible machine learning library for nonparametric high-dimensional regression and classification with guarantees.

Documentation

Installation

Prerequisites

  • Python 3.8+
  • C++ compiler (g++, clang, or MSVC)
  • CMake 3.15+
  • Eigen3

Quick Install

pip install hapc

Prebuilt wheels are published for Linux (manylinux2014, x86_64), macOS (Intel + Apple Silicon) and Windows, for CPython 3.8–3.12. No compiler, CMake or Eigen is needed when a wheel is available.

Linux / HPC clusters

The Linux wheels use the manylinux2014 baseline (glibc 2.17), so pip install hapc works out of the box on HPC login/compute nodes — no conda toolchain, devtoolset, or sysroot setup required:

pip install hapc

If you must build from the source distribution (niche architecture, very old Python, or an air-gapped node), provide a C++17 compiler and either let CMake fetch Eigen automatically (needs network) or install Eigen and let find_package(Eigen3) find it:

# with conda compilers (recommended on HPC)
conda install -c conda-forge cxx-compiler cmake eigen
pip install hapc --no-binary hapc

Install from GitHub (latest development version)

pip install git+https://github.com/meixide/hapc.git

Or with editable install for development:

git clone https://github.com/meixide/hapc.git
cd hapc
pip install -e .

Install build dependencies

If installation fails, you may need to install build dependencies:

macOS:

brew install cmake eigen

Ubuntu/Debian:

sudo apt-get install cmake libeigen3-dev build-essential

Windows:

pip install cmake
# Install Visual Studio Build Tools or use conda
conda install -c conda-forge eigen

Quick Start

import numpy as np
from hapc.single import single_pcghal
from hapc.cv import pcghal_cv

# Generate sample data
X = np.random.randn(100, 5)
Y = X[:, 0] + 0.5 * X[:, 1] + np.random.randn(100) * 0.1

# Single fit with fixed lambda
result = single_pcghal(X, Y, maxdeg=2, npc=5, single_lambda=0.01)
print(f"Risk: {result.optimizer_output.risk:.6f}")

# Cross-validation to select lambda
lambdas = np.logspace(-4, 0, 10)
cv_result = pcghal_cv(X, Y, maxdeg=2, npc=5, lambdas=lambdas, nfolds=5)
print(f"Best lambda: {cv_result.best_lambda:.6f}")

# Make predictions
X_test = np.random.randn(20, 5)
result = single_pcghal(X, Y, maxdeg=2, npc=5, single_lambda=0.01, predict=X_test)
print(f"Predictions: {result.predictions}")

Usage

Regression

from hapc.single import single_pcghal

result = single_pcghal(
    X, Y,
    maxdeg=2,        # Maximum degree of interactions
    npc=10,          # Number of principal components
    single_lambda=0.01,
    predict=X_test   # Optional: test data for predictions
)

Classification

from hapc.single import single_pcghal

result = single_pcghal(
    X, Y_binary,
    maxdeg=2,
    npc=10,
    single_lambda=0.01,
    predict=X_test
)

Cross-Validation

from hapc.cv import pcghal_cv

cv_result = pcghal_cv(
    X, Y,
    maxdeg=2,
    npc=10,
    lambdas=np.logspace(-4, 0, 20),
    nfolds=5
)
print(cv_result.best_lambda)

Average Treatment Effect (ATE)

Estimate the ATE E[Y(1)] − E[Y(0)] with HAPC nuisance models and a doubly-robust (AIPW) efficient influence function. ate_hapc returns a point estimate and a (1 − alpha) Wald confidence interval.

from hapc import ate_hapc

# W: covariates (n, p); A: binary treatment in {0,1} or {-1,+1}; Y: outcome
res = ate_hapc(W, Y, A, alpha=0.05, method="undersmooth")
print(res.estimate, res.lower, res.upper)

Two bias-control strategies are available through method:

  • method="undersmooth" (default) — single-sample estimator. The outcome model is undersmoothed (λ pushed below the CV-optimal value) until the empirical influence function is within σ / (√n · log n). This requires the full PC basis (npcs = n, the default) and a λ grid that reaches small λ (defaults log_lambda_out_min = -10); otherwise the gate never reaches the low-bias regime and ate_hapc emits a warning. Pass report_undersmoothing=True to print the |mean(EIF)|-vs-λ path.
  • method="crossfit" — DML-style K-fold cross-fitting (cf_folds, default 5, stratified by treatment). Both nuisances are fit on the training folds and the influence function is evaluated out-of-fold, giving honest point estimates and coverage without undersmoothing. Recommended under good overlap.

Discrete-time survival (family = "logit-hazard")

Fit a discrete-time logistic hazard model with HAPC. You supply only the observed right-censored data — baseline covariates X, the observed time T = min(T_event, C), and the event indicator Delta = 1(T_event <= C) — and the wrapper performs the person-period expansion (one row per subject-per-interval-at-risk, hazard label = 1 at the event interval), prepends the visit time as the first HAL covariate, and cross-validates the binomial fit.

Model. The discrete hazard is the conditional event probability in interval t given survival up to t, modelled on the logit scale by a HAPC fit f of the augmented covariate (t, x):

lambda(t | x) = P(T_event = t | T_event >= t, X = x)
logit lambda(t | x) = f(t, x)

Person-period likelihood. Under independent right-censoring the observed-data likelihood factorises over the at-risk intervals,

prod_i prod_{t <= T_i}  lambda(t|x_i)^Y_it * (1 - lambda(t|x_i))^(1 - Y_it),
with  Y_it = 1(T_event_i = t),

which is exactly the Bernoulli (logistic) likelihood of the expanded person-period table — so a binomial HAPC fit of Y_it on (t, x_i) estimates the discrete hazard (Cox 1972; Brown 1975; Allison 1982).

Survival. The conditional survival function follows by the product-limit relation S(t | x) = prod_{s <= t} (1 - lambda(s | x)), returned for new subjects when predict= is supplied.

from hapc import hazard_hapc
import numpy as np

# X: baseline covariates (n, p); T: observed times; Delta: 0/1 event indicator
fit = hazard_hapc(X, T, Delta, norm="1", max_degree=2, time_grid=np.arange(1, 7))
fit.hazard        # estimated hazard per person-period row (CV predictions)
fit.best_lambda, fit.interior   # CV-selected lambda; is it interior to the grid?

# survival curves S(t|x) for new subjects
fit = hazard_hapc(X, T, Delta, norm="1", predict=X_new)
fit.predict_survival            # (m, K) survival probabilities over the grid
library(hapc)
# equivalent to cv.hapc(X, T, family = "logit-hazard", Delta = Delta, norm = "1")
fit <- hazard.hapc(X, T, Delta, norm = "1", max_degree = 2, time_grid = 1:6)
fit$hazard; fit$best_lambda; fit$interior

norm must be "1" (logistic LASSO) or "2" (logistic ridge); norm = "sv" is not implemented for this family and is flagged.

Returns (Python HazardResult / R hapc_hazard):

  • hazard — cross-validated discrete hazard for each person-period row
  • lambdas, risk, best_lambda — CV grid, mean logistic deviance, selected λ
  • interior — whether best_lambda is strictly inside the grid (sanity check)
  • time_grid, ids/id, Y — the discrete grid and person-period bookkeeping
  • predict_hazard, predict_survival — hazard surface and survival curves for new subjects (only when predict= is given)
  • cv — the underlying cross-validation result

Worked end-to-end examples (five hazard data-generating processes, with true-vs-estimated hazard scatters and CV risk-vs-λ curves verifying an interior optimum) are in examples/hazard_logit_hazard_examples.R and examples/hazard_logit_hazard_examples.py.

References. Cox (1972, JRSS B); Brown (1975, Biometrics); Allison (1982, Sociological Methodology); Singer & Willett (2003, Applied Longitudinal Data Analysis); Benkeser & van der Laan (2016, IEEE DSAA).

API Reference

hapc.single.single_pcghal()

Fit PC-GHAL with a single lambda value.

Parameters:

  • X (ndarray, shape (n, p)): Input features
  • Y (ndarray, shape (n,)): Response variable
  • maxdeg (int): Maximum degree of interactions
  • npc (int): Number of principal components
  • single_lambda (float): Regularization parameter
  • max_iter (int, default=100): Maximum iterations
  • tol (float, default=1e-6): Convergence tolerance
  • verbose (bool, default=False): Print progress
  • predict (ndarray, optional): Test data for predictions
  • center (bool, default=True): Center the design matrix

Returns:

  • result.optimizer_output.alpha: Coefficients
  • result.optimizer_output.risk: Final risk
  • result.optimizer_output.iter: Iterations until convergence
  • result.predictions: Predictions on test data (if provided)

hapc.cv.pcghal_cv()

Cross-validation to select lambda.

Parameters:

  • lambdas (ndarray): Grid of lambda values to test
  • nfolds (int, default=5): Number of CV folds
  • ...other parameters same as single_pcghal

Returns:

  • cv_result.best_lambda: Optimal lambda
  • cv_result.mses: CV errors for each lambda
  • cv_result.best_model: Fitted model with best lambda
  • cv_result.predictions: Predictions on test data (if provided)

Contributing

Contributions welcome! The C++ core is shared between R and Python packages.

git clone https://github.com/meixide/hapc.git
cd hapc
pip install -e .
pytest

License

MIT License - see LICENSE file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hapc-2.5.0.tar.gz (78.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hapc-2.5.0-cp312-cp312-win_amd64.whl (540.3 kB view details)

Uploaded CPython 3.12Windows x86-64

hapc-2.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (302.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

hapc-2.5.0-cp312-cp312-macosx_10_13_universal2.whl (478.2 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

hapc-2.5.0-cp311-cp311-win_amd64.whl (535.1 kB view details)

Uploaded CPython 3.11Windows x86-64

hapc-2.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (303.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

hapc-2.5.0-cp311-cp311-macosx_10_9_universal2.whl (473.4 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

hapc-2.5.0-cp310-cp310-win_amd64.whl (533.0 kB view details)

Uploaded CPython 3.10Windows x86-64

hapc-2.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (302.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

hapc-2.5.0-cp310-cp310-macosx_10_9_universal2.whl (470.7 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

hapc-2.5.0-cp39-cp39-win_amd64.whl (533.1 kB view details)

Uploaded CPython 3.9Windows x86-64

hapc-2.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (302.5 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

hapc-2.5.0-cp39-cp39-macosx_10_9_universal2.whl (470.9 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

hapc-2.5.0-cp38-cp38-win_amd64.whl (533.0 kB view details)

Uploaded CPython 3.8Windows x86-64

hapc-2.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (301.8 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

hapc-2.5.0-cp38-cp38-macosx_10_9_universal2.whl (470.1 kB view details)

Uploaded CPython 3.8macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file hapc-2.5.0.tar.gz.

File metadata

  • Download URL: hapc-2.5.0.tar.gz
  • Upload date:
  • Size: 78.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.5.0.tar.gz
Algorithm Hash digest
SHA256 24fe08313afbc26f564075bc3f46e9266144ff45feb06aecd52d66852d9e14a6
MD5 66257523b4a617b6ee481d0f806686b3
BLAKE2b-256 996c22e55cea4314b2a0754517674f5716bc81982a9c19c5729a446afeee03a2

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: hapc-2.5.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 540.3 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.5.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a44bf695122a2e323fed9329c9e2bc11eb006adb720a62aa4e188f42383dcfa1
MD5 bf19d099df532ccc3ef835fcce9891d2
BLAKE2b-256 42de47f74cff42150a12c58e91a33806de0c466cec3aeeaf28900f2de192b589

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hapc-2.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f96664c5167c12230e2cf9d023c00d4a4a9409365690ad664ab19bf41ace07d6
MD5 0554cbd45f8291a526cd2c896c37c62d
BLAKE2b-256 df68f229d1f212a5d1948e2716f5de9e9da504ce0c6fddcd4aac047261836196

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for hapc-2.5.0-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 7e6417f757f3bb138ef1becf219f52f906fa2b1cbe09cebc4c1174ff00b0abac
MD5 22ac1bd5e403edd6668eac7b9c0b018d
BLAKE2b-256 13bda7092f4c79c5aead3564811725f43239b2d6caa49ee2f17c5a38ed147c03

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: hapc-2.5.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 535.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.5.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 51a0ecbddfa639382670cba7cbef39dbb5d9c2c486d951047948b3075619a884
MD5 7631e7c2841f4876a92a3e4aa513cf18
BLAKE2b-256 047bc77e0a8ecb8537dcfba60ccf41e4fff47f442c089a7b14d47b720281ef0a

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hapc-2.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 efaf638a9d1c78d20d0c7b4a43a135ffbc974fe7c0f5612b808e84e511646014
MD5 da88e9017decb7ff62994b6dfc8d045c
BLAKE2b-256 8892eb594bc33f80b4d50fc1adb534d6ac3f7d4c7173874527261cfc82695ded

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for hapc-2.5.0-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 f751c2a9120f76a2d56bcb92c9204a1b7fe5793f727881287998429f336f2825
MD5 6c093890618116cf0fd41905e1517e49
BLAKE2b-256 13ace9a8e5af59560e3191c066db20ad3823ed79d6b56d6a3f57b5847e531c90

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: hapc-2.5.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 533.0 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.5.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 77bfe02814ccf9f2ba638f14cf4f9dc923070b33bb18a69776b8b6fbcd3e3f9e
MD5 b617aaf1bcbf2f485e7dca4e86f8887a
BLAKE2b-256 a8f53d57c7f0dd2a0bbf7a5cfb928e7235959fc2e2bebecf5c366f8731ed4d74

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hapc-2.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5a3d39fd8aa7505858628b483c81454658e539f798cb43db14e2a6079938eb23
MD5 b69dfeb3b63f730f3eab562ea16050e8
BLAKE2b-256 3902cb4f1ce61e70f1c79b8ac5b3113fbdea4b5cd4e25a42cb676c1c8ef48c63

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for hapc-2.5.0-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 c95d765b323f000d58a13b6b312cd9d588ab4f8cfb4825fefa10c455646cdabc
MD5 2ca242a7c65cd664fe4e4e04ddfe6de6
BLAKE2b-256 d60b976b48e5ed38879b0653400b2319fff82d6d143ed8677de203923a69a995

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: hapc-2.5.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 533.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.5.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 457d97d17ec5d2aad97169400e0bbdfcdf0c003ac7bace4545d1b65e10c28ace
MD5 352baa043ef7f05e3f298cce1c35e85e
BLAKE2b-256 6198d5f9cb08f30fd5f35fab82391e0f0f47560a2dd9b227a5cc4b9411095d3f

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hapc-2.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 767ffd8a9cc878213ff2cea57d532679f5d50cfe4953d9771aca29623e60768d
MD5 1135688125bbd0e0824f72a6ddfe0ec5
BLAKE2b-256 ea5fe17179d237f0b8ec5c582757ac3cbc849be5f4d585b32dd449e921492c36

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

  • Download URL: hapc-2.5.0-cp39-cp39-macosx_10_9_universal2.whl
  • Upload date:
  • Size: 470.9 kB
  • Tags: CPython 3.9, macOS 10.9+ universal2 (ARM64, x86-64)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.5.0-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 99d42b3a44c316f69173c51457dfd9e801b9dd73b4414ee855c3378e10a54063
MD5 8a022530dada5e0162932dd829b645ea
BLAKE2b-256 eb36491789519712344c4fd78927e52c37d525896ada5873ea3f975b2d28f5ab

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: hapc-2.5.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 533.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.5.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 666fd8867b640bbacb501d7e0c03c1e4a05dcdc0e38309d0ff580f8589a230a2
MD5 b7107bacdf81b1d296236d749fb2302f
BLAKE2b-256 48c536493203a72d35f61409e58bc5cea5ab2da2e518639d3687d9b1806004d7

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hapc-2.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0de2e53dd4efa0a9d8f2347c180d954c656ff4c4ee4676d1d8b3530cc64ff3ac
MD5 a7c6cfd5be61408df68bea6cfbea70e8
BLAKE2b-256 36f098464ea3c85937d56f668a586d36c2d3d253f30f53a346263ac105d80b42

See more details on using hashes here.

File details

Details for the file hapc-2.5.0-cp38-cp38-macosx_10_9_universal2.whl.

File metadata

  • Download URL: hapc-2.5.0-cp38-cp38-macosx_10_9_universal2.whl
  • Upload date:
  • Size: 470.1 kB
  • Tags: CPython 3.8, macOS 10.9+ universal2 (ARM64, x86-64)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.5.0-cp38-cp38-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 dd9d3ecfea0d13e213457836f4277607152bfd2e246b0d619500489d580df619
MD5 577883bd155dd1306800053bd69f7200
BLAKE2b-256 4628d8a8a18a4d097580958a3ff4b148dee08a6e7f1074487342663bc51f412d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page