Skip to main content

Highly Adaptive Principal Components

Project description

HAPC: Highly Adaptive Prinicipal Components

A fast and flexible machine learning library for nonparametric high-dimensional regression and classification with guarantees.

Documentation

Installation

Prerequisites

  • Python 3.8+
  • C++ compiler (g++, clang, or MSVC)
  • CMake 3.15+
  • Eigen3

Quick Install

pip install hapc

Prebuilt wheels are published for Linux (manylinux2014, x86_64), macOS (Intel + Apple Silicon) and Windows, for CPython 3.8–3.12. No compiler, CMake or Eigen is needed when a wheel is available.

Linux / HPC clusters

The Linux wheels use the manylinux2014 baseline (glibc 2.17), so pip install hapc works out of the box on HPC login/compute nodes — no conda toolchain, devtoolset, or sysroot setup required:

pip install hapc

If you must build from the source distribution (niche architecture, very old Python, or an air-gapped node), provide a C++17 compiler and either let CMake fetch Eigen automatically (needs network) or install Eigen and let find_package(Eigen3) find it:

# with conda compilers (recommended on HPC)
conda install -c conda-forge cxx-compiler cmake eigen
pip install hapc --no-binary hapc

Install from GitHub (latest development version)

pip install git+https://github.com/meixide/hapc.git

Or with editable install for development:

git clone https://github.com/meixide/hapc.git
cd hapc
pip install -e .

Install build dependencies

If installation fails, you may need to install build dependencies:

macOS:

brew install cmake eigen

Ubuntu/Debian:

sudo apt-get install cmake libeigen3-dev build-essential

Windows:

pip install cmake
# Install Visual Studio Build Tools or use conda
conda install -c conda-forge eigen

Quick Start

import numpy as np
from hapc.single import single_pcghal
from hapc.cv import pcghal_cv

# Generate sample data
X = np.random.randn(100, 5)
Y = X[:, 0] + 0.5 * X[:, 1] + np.random.randn(100) * 0.1

# Single fit with fixed lambda
result = single_pcghal(X, Y, maxdeg=2, npc=5, single_lambda=0.01)
print(f"Risk: {result.optimizer_output.risk:.6f}")

# Cross-validation to select lambda
lambdas = np.logspace(-4, 0, 10)
cv_result = pcghal_cv(X, Y, maxdeg=2, npc=5, lambdas=lambdas, nfolds=5)
print(f"Best lambda: {cv_result.best_lambda:.6f}")

# Make predictions
X_test = np.random.randn(20, 5)
result = single_pcghal(X, Y, maxdeg=2, npc=5, single_lambda=0.01, predict=X_test)
print(f"Predictions: {result.predictions}")

Usage

Regression

from hapc.single import single_pcghal

result = single_pcghal(
    X, Y,
    maxdeg=2,        # Maximum degree of interactions
    npc=10,          # Number of principal components
    single_lambda=0.01,
    predict=X_test   # Optional: test data for predictions
)

Classification

from hapc.single import single_pcghal

result = single_pcghal(
    X, Y_binary,
    maxdeg=2,
    npc=10,
    single_lambda=0.01,
    predict=X_test
)

Cross-Validation

from hapc.cv import pcghal_cv

cv_result = pcghal_cv(
    X, Y,
    maxdeg=2,
    npc=10,
    lambdas=np.logspace(-4, 0, 20),
    nfolds=5
)
print(cv_result.best_lambda)

Average Treatment Effect (ATE)

Estimate the ATE E[Y(1)] − E[Y(0)] with HAPC nuisance models and a doubly-robust (AIPW) efficient influence function. ate_hapc returns a point estimate and a (1 − alpha) Wald confidence interval.

from hapc import ate_hapc

# W: covariates (n, p); A: binary treatment in {0,1} or {-1,+1}; Y: outcome
res = ate_hapc(W, Y, A, alpha=0.05, method="undersmooth")
print(res.estimate, res.lower, res.upper)

Two bias-control strategies are available through method:

  • method="undersmooth" (default) — single-sample estimator. The outcome model is undersmoothed (λ pushed below the CV-optimal value) until the empirical influence function is within σ / (√n · log n). This requires the full PC basis (npcs = n, the default) and a λ grid that reaches small λ (defaults log_lambda_out_min = -10); otherwise the gate never reaches the low-bias regime and ate_hapc emits a warning. Pass report_undersmoothing=True to print the |mean(EIF)|-vs-λ path.
  • method="crossfit" — DML-style K-fold cross-fitting (cf_folds, default 5, stratified by treatment). Both nuisances are fit on the training folds and the influence function is evaluated out-of-fold, giving honest point estimates and coverage without undersmoothing. Recommended under good overlap.

Discrete-time survival (family = "logit-hazard")

Fit a discrete-time logistic hazard model with HAPC. You supply only the observed right-censored data — baseline covariates X, the observed time T = min(T_event, C), and the event indicator Delta = 1(T_event <= C) — and the wrapper performs the person-period expansion (one row per subject-per-interval-at-risk, hazard label = 1 at the event interval), prepends the visit time as the first HAL covariate, and cross-validates the binomial fit.

Model. The discrete hazard is the conditional event probability in interval t given survival up to t, modelled on the logit scale by a HAPC fit f of the augmented covariate (t, x):

lambda(t | x) = P(T_event = t | T_event >= t, X = x)
logit lambda(t | x) = f(t, x)

Person-period likelihood. Under independent right-censoring the observed-data likelihood factorises over the at-risk intervals,

prod_i prod_{t <= T_i}  lambda(t|x_i)^Y_it * (1 - lambda(t|x_i))^(1 - Y_it),
with  Y_it = 1(T_event_i = t),

which is exactly the Bernoulli (logistic) likelihood of the expanded person-period table — so a binomial HAPC fit of Y_it on (t, x_i) estimates the discrete hazard (Cox 1972; Brown 1975; Allison 1982).

Survival. The conditional survival function follows by the product-limit relation S(t | x) = prod_{s <= t} (1 - lambda(s | x)), returned for new subjects when predict= is supplied.

from hapc import hazard_hapc
import numpy as np

# X: baseline covariates (n, p); T: observed times; Delta: 0/1 event indicator
fit = hazard_hapc(X, T, Delta, norm="1", max_degree=2, time_grid=np.arange(1, 7))
fit.hazard        # estimated hazard per person-period row (CV predictions)
fit.best_lambda, fit.interior   # CV-selected lambda; is it interior to the grid?

# survival curves S(t|x) for new subjects
fit = hazard_hapc(X, T, Delta, norm="1", predict=X_new)
fit.predict_survival            # (m, K) survival probabilities over the grid
library(hapc)
# equivalent to cv.hapc(X, T, family = "logit-hazard", Delta = Delta, norm = "1")
fit <- hazard.hapc(X, T, Delta, norm = "1", max_degree = 2, time_grid = 1:6)
fit$hazard; fit$best_lambda; fit$interior

norm must be "1" (logistic LASSO) or "2" (logistic ridge); norm = "sv" is not implemented for this family and is flagged.

Returns (Python HazardResult / R hapc_hazard):

  • hazard — cross-validated discrete hazard for each person-period row
  • lambdas, risk, best_lambda — CV grid, mean logistic deviance, selected λ
  • interior — whether best_lambda is strictly inside the grid (sanity check)
  • time_grid, ids/id, Y — the discrete grid and person-period bookkeeping
  • predict_hazard, predict_survival — hazard surface and survival curves for new subjects (only when predict= is given)
  • cv — the underlying cross-validation result

Worked end-to-end examples (five hazard data-generating processes, with true-vs-estimated hazard scatters and CV risk-vs-λ curves verifying an interior optimum) are in examples/hazard_logit_hazard_examples.R and examples/hazard_logit_hazard_examples.py.

References. Cox (1972, JRSS B); Brown (1975, Biometrics); Allison (1982, Sociological Methodology); Singer & Willett (2003, Applied Longitudinal Data Analysis); Benkeser & van der Laan (2016, IEEE DSAA).

API Reference

hapc.single.single_pcghal()

Fit PC-GHAL with a single lambda value.

Parameters:

  • X (ndarray, shape (n, p)): Input features
  • Y (ndarray, shape (n,)): Response variable
  • maxdeg (int): Maximum degree of interactions
  • npc (int): Number of principal components
  • single_lambda (float): Regularization parameter
  • max_iter (int, default=100): Maximum iterations
  • tol (float, default=1e-6): Convergence tolerance
  • verbose (bool, default=False): Print progress
  • predict (ndarray, optional): Test data for predictions
  • center (bool, default=True): Center the design matrix

Returns:

  • result.optimizer_output.alpha: Coefficients
  • result.optimizer_output.risk: Final risk
  • result.optimizer_output.iter: Iterations until convergence
  • result.predictions: Predictions on test data (if provided)

hapc.cv.pcghal_cv()

Cross-validation to select lambda.

Parameters:

  • lambdas (ndarray): Grid of lambda values to test
  • nfolds (int, default=5): Number of CV folds
  • ...other parameters same as single_pcghal

Returns:

  • cv_result.best_lambda: Optimal lambda
  • cv_result.mses: CV errors for each lambda
  • cv_result.best_model: Fitted model with best lambda
  • cv_result.predictions: Predictions on test data (if provided)

Contributing

Contributions welcome! The C++ core is shared between R and Python packages.

git clone https://github.com/meixide/hapc.git
cd hapc
pip install -e .
pytest

License

MIT License - see LICENSE file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hapc-2.6.0.tar.gz (80.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hapc-2.6.0-cp312-cp312-win_amd64.whl (540.3 kB view details)

Uploaded CPython 3.12Windows x86-64

hapc-2.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (302.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

hapc-2.6.0-cp312-cp312-macosx_10_13_universal2.whl (478.2 kB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

hapc-2.6.0-cp311-cp311-win_amd64.whl (535.1 kB view details)

Uploaded CPython 3.11Windows x86-64

hapc-2.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (303.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

hapc-2.6.0-cp311-cp311-macosx_10_9_universal2.whl (473.4 kB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

hapc-2.6.0-cp310-cp310-win_amd64.whl (533.0 kB view details)

Uploaded CPython 3.10Windows x86-64

hapc-2.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (302.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

hapc-2.6.0-cp310-cp310-macosx_10_9_universal2.whl (470.7 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

hapc-2.6.0-cp39-cp39-win_amd64.whl (533.1 kB view details)

Uploaded CPython 3.9Windows x86-64

hapc-2.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (302.5 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

hapc-2.6.0-cp39-cp39-macosx_10_9_universal2.whl (470.9 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

hapc-2.6.0-cp38-cp38-win_amd64.whl (533.0 kB view details)

Uploaded CPython 3.8Windows x86-64

hapc-2.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (301.8 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

hapc-2.6.0-cp38-cp38-macosx_10_9_universal2.whl (470.1 kB view details)

Uploaded CPython 3.8macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file hapc-2.6.0.tar.gz.

File metadata

  • Download URL: hapc-2.6.0.tar.gz
  • Upload date:
  • Size: 80.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.6.0.tar.gz
Algorithm Hash digest
SHA256 f5b5db661813c4e25bc9fb2eb413374c75fc0592fa2c987e2364268f1caf30b1
MD5 9849ae79c92f7668551fb66b1bb69b45
BLAKE2b-256 ed0be6691edd765c3e2a78a210187f7fd0ed77b4d3cb8e87328cd2026cf70498

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: hapc-2.6.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 540.3 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.6.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 67236a26a562ca9404438e9da8a6ba677ce40009c1eaf53b01fe5c7ee602c456
MD5 a4fc3bf972604259a169c3a633f4d994
BLAKE2b-256 11674f3aea1f0987dc591b575bee1fe77f6bd53ebf1839579ffc011827941851

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hapc-2.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 85ee99fa6b5dea20fcece820bee5becc61db0e8b66a698079ee355c104d69d9d
MD5 4af92bf6c582916b38025d80ac7de10a
BLAKE2b-256 7fae7237917036dfa46bdde3da2e4c4da55f8513438a1dd51631c5689383d77a

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for hapc-2.6.0-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 ac5d4956094df22e31388b32abe2c26a14c0989d250f57bc3d31455559052ea0
MD5 ffb35b334310cac088cbba130d875cda
BLAKE2b-256 4c886aa93cb67bc571462542b8f824a9c671828fa65108cc0f863dd99ba3d7f9

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: hapc-2.6.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 535.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.6.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 d3f8e70f538c42e7dc6d079f839e92fe24313469457d5f943c9d5da6ab42eb38
MD5 40217a1158c734f8697fdc8f2c9bbd7b
BLAKE2b-256 a79bd83a5da8ede92015a9150ff44706e9aecf92238641712206c02778997cf7

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hapc-2.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 20695dddd8c8e25957fdd6a192992488a1dcfe1af178acfbc88aa910d0fd7325
MD5 759912646a10c72893f8d1b1fe8580a8
BLAKE2b-256 7187d6b970697b04c7db1b6db54746c09e84ab028ae13e6f74a18ffeb04e81a5

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for hapc-2.6.0-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 6392fd282b6c6cd858c91b9ec40ef26f2e1dfdc1695d8569d4b29763bf4389bd
MD5 3f02de08008e326432e903553d068fea
BLAKE2b-256 04a8d5c1e04fb857661b9e367801b113dd321b887dcea2a7417f2b084f1b5fd4

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: hapc-2.6.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 533.0 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.6.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 e20dfd8d00a252f11c13b3872c5baaf26a020e23b13167d3e7e068bfa350b0da
MD5 50e13106c2ce533300a45dacf26dddef
BLAKE2b-256 7ebda87ba85aa8d625299ec88e4c455f62040083ab445d44f6841ec5efeb0bc0

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hapc-2.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d7d22f4663377a41cfd444a9d28cac0698e5ef8da5d0bb24213f09873da2aab5
MD5 e7699adc0fea54a1f842f2cf36170edb
BLAKE2b-256 b6901d0e8d84127e94117b187aa4679fc1df44a082e760213134e124f847dfa8

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for hapc-2.6.0-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 26a0827dc5a2b8015ab5c80813866852055422657508e6ad004b1cfc678ed04f
MD5 4b695f442aef341416c082ee90236cab
BLAKE2b-256 3270a695d7c50b5b600ddefd403f9426219787c7683ab89e04c15ccc6c8de484

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: hapc-2.6.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 533.1 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.6.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 80a9199770652caf6addcc22ae33b32c7e2105cd95bf7e2aef1576d235063467
MD5 95c6a6fdb2de525e6a9c36bf3d6b2b01
BLAKE2b-256 057dc3e6ca6dae01d2a9beec02860a21818824e0d0c9d4a832f7b532f0d2bffb

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hapc-2.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 32ae5c6751c1376d6c32401e249c89855c5025e3e387c39508a6ef1d5358b1fa
MD5 ab83373baf921fe855b0c1389dfaf829
BLAKE2b-256 a906202f51b0373254c3e4db08d1cf30e04b002c23f82d22fced59674a08215b

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

  • Download URL: hapc-2.6.0-cp39-cp39-macosx_10_9_universal2.whl
  • Upload date:
  • Size: 470.9 kB
  • Tags: CPython 3.9, macOS 10.9+ universal2 (ARM64, x86-64)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.6.0-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 34781f6fe93868fd750ddda9422bd61db2bb7498bf22dae7aeee41b0d0585017
MD5 1cd8219c81e7f59f506d1e541c2f6c7a
BLAKE2b-256 a6e27cfde373b998de8ff9b0c3450d075be400fe60e834a88e405ede6929620d

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: hapc-2.6.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 533.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.6.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 963fd268ff4d9a5161cee37fd772731b03e83ba32a4b418095d719ca30b60b4c
MD5 60fa5670450e06edf19268d26eb1c084
BLAKE2b-256 f051d4d68500d8f402eaa677ae657466a911bc852c53a98346c59ad74f34b6e6

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for hapc-2.6.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bc4bdcb80fdc7b9bff6901b9c2f5d497c18c07d95ed70dd213410e51824cdf03
MD5 684fe8ee0f561fbfed6e276912d295d9
BLAKE2b-256 acc70a88e05af7b656d84ed5b8bf9ede8e2341f0d5aa5a35d1f673fed6780dc0

See more details on using hashes here.

File details

Details for the file hapc-2.6.0-cp38-cp38-macosx_10_9_universal2.whl.

File metadata

  • Download URL: hapc-2.6.0-cp38-cp38-macosx_10_9_universal2.whl
  • Upload date:
  • Size: 470.1 kB
  • Tags: CPython 3.8, macOS 10.9+ universal2 (ARM64, x86-64)
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hapc-2.6.0-cp38-cp38-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 b5d2c5aaa72f902014728081d178b21527d779c8d2023f331c2a562fcf4cc66a
MD5 de7771d41b47185e1ed3219cd60f08d2
BLAKE2b-256 d931c886f3a05735d1825a0fa91858890a7813e2d483bdf2bfa085171dd10f3e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page