Skip to main content

Weighted structured nonconvex sparse models (Python + Rust)

Project description

skein

Weighted structured nonconvex sparse models. Rust core + Python API.

Documentation: the docs site has the full conceptual reference (penalties, datafits, weights, backends), porting guides for glmnet / ncvreg / grpreg, worked examples, and an auto-generated API reference. Hosted on Read the Docs once the project is connected (config in .readthedocs.yaml); preview locally with mkdocs serve. CI builds it --strict on every PR.

skein targets a niche that's well-served in R (grpreg, ncvreg) but missing in Python at production quality: nonconvex group-structured penalties (group MCP, group SCAD, sparse-group nonconvex) with first-class support for weights along three axes — per-sample, per-feature, and per-group.

Status

v0.1 development. Core algorithms and the headline GLM family are in place; design-matrix backends (sparse, mmap, chunked) are next. See ROADMAP.md for the full plan.

Done so far:

  • Solvers — production CD core (path solver, strong rule + KKT verification, gap-safe screening, Anderson acceleration); group block-CD with LLA outer loop for nonconvex group penalties; Rayon-parallel group sweeps; operator-norm Lipschitz via power iteration.
  • Datafits — least squares, binomial logistic, Poisson (log link), Cox PH (Breslow ties). All glued together by a GlmDatafit trait that exposes a weighted-LS surrogate; the M1/M2 inner solvers absorb every GLM unchanged.
  • Penalties — MCP, SCAD, group lasso, group MCP, sparse-group lasso, sparse-group MCP. Per-feature and per-group weights honored throughout.
  • Python — sklearn-compatible estimators for every (datafit × penalty) combination; type stubs; warm-started λ-paths; standardization with original-scale coef_ / intercept_ recovery (dense backend).
  • Graphical models — sparse precision matrix estimation (GraphicalLasso / GraphicalMCP / GraphicalSCAD) and joint estimation across K related populations (JointGraphicalLasso / JointGraphicalMCP, Danaher–Wang–Witten 2014 group form via ADMM), with EBIC tuning. Nonconvex penalties on edges close the shrinkage-bias gap that sklearn.covariance.GraphicalLasso and R's glasso / qgraph / bootnet leave open.

M8 (Distribution & DX) is done: CI + cibuildwheel + Read the Docs + 25-page mkdocs site (concepts + R-porting + extending + examples + API ref) + R numerical regression suite vs glmnet/ncvreg/grpreg + stable Rust API contract. The library is pip install-able once published, documented end-to-end, and pinned against R reference fits so we don't silently drift.

Coming next: algorithmic features — M5.x adaptive weights and stability selection are the next high-value milestones; both leverage the existing per-feature/per-group weight axes that are already wired through every solver.

Layout

crates/skein-core/   pure Rust: traits + algorithms (no Python)
crates/skein-py/     PyO3 bindings (cdylib → skein_glm._core)
python/skein/        sklearn-compatible estimators + ABCs for extensions
tests/               pytest smoke tests
benches/             criterion (Rust) + asv (Python)

The Rust traits (DesignMatrix, Datafit, GlmDatafit, Penalty, GroupPenalty) and their Python ABC mirrors (skein.penalties.Penalty, etc.) are the extension surface for downstream per-paper projects.

Quick start

import numpy as np
from skein import MCPPathRegressor, LogisticGroupMCPPathRegressor, CoxMCPRegressor

# Nonconvex sparse least squares with a λ-path.
rng = np.random.default_rng(0)
n, p = 200, 50
X = rng.standard_normal((n, p))
y = X[:, :3] @ np.array([1.5, -2.0, 0.8]) + 0.1 * rng.standard_normal(n)
model = MCPPathRegressor(gamma=3.0, n_lambdas=50, standardize=True).fit(X, y)
print(model.coefs_[-1, :5], model.intercepts_[-1])

# Logistic + group MCP via LLA, with sklearn-style predict/predict_proba.
groups = np.repeat(np.arange(p // 5), 5)  # 5 features per group
y_bin = (X[:, :3].sum(axis=1) > 0).astype(float)
clf = LogisticGroupMCPPathRegressor(groups=groups, gamma=3.0, n_lambdas=20).fit(X, y_bin)
proba = clf.predict_proba(X)  # shape (n, n_lambdas)

# Cox PH with right-censored survival data.
time = rng.exponential(1.0 / np.exp(X[:, :3].sum(axis=1)))
event = rng.uniform(size=n) < 0.7
cox = CoxMCPRegressor(lambda_=0.01, gamma=3.0).fit(X, time, event.astype(float))
risk = cox.predict(X)  # prognostic index η

Every regressor follows the same (datafit) × (penalty) × ({,Path}Regressor) naming scheme. The path variants warm-start across λ; their coefs_ / intercepts_ (where applicable) are 2D arrays indexed by λ.

Performance

skein is benchmarked against sklearn / skglm / celer / glmnet / ncvreg on shared λ-grids via the harness under benches/. Headline numbers (Apple M1, 16 GB; median of N timed trials after a warm-up):

scenario size skein next-fastest comparator
Lasso LS — deep medium (n=10k, p=1k) 1.17 s sklearn 0.125 s
Lasso LS — sparse medium 0.78 s sklearn 0.099 s
MCP LS — deep medium 1.37 s skglm 3.35 s
MCP LS — sparse medium 0.75 s ncvreg 1.17 s
MCP LS — deep large (n=100k, p=10k) 510 s skglm 666 s
MCP LS — sparse large 497 s skglm 702 s
SCAD LS — deep medium 1.78 s ncvreg 7.99 s
SCAD LS — sparse medium 0.90 s ncvreg 1.86 s

skein is the fastest on every nonconvex row across every size; on convex lasso/LS the sklearn Cython lasso_path remains the floor at ~8–9× faster on the medium bench. See docs/benchmarks/mcp_ls.md and docs/benchmarks/scad_ls.md for the full nonconvex write-ups (correctness matrices + methodology + per-size tables) and docs/perf/lasso_ls_profile.md for the lasso/LS profiling work that drove M10.

Reproduce with python benches/run.py --scenarios mcp_ls mcp_ls_sparse --sizes small,medium.

Build

# Rust core only (fast iteration on algorithms)
cargo test -p skein-core

# Full Python package (requires maturin in your env)
maturin develop --release
pytest

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skein_glm-0.7.0.tar.gz (248.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

skein_glm-0.7.0-cp310-abi3-win_amd64.whl (1.7 MB view details)

Uploaded CPython 3.10+Windows x86-64

skein_glm-0.7.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

skein_glm-0.7.0-cp310-abi3-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file skein_glm-0.7.0.tar.gz.

File metadata

  • Download URL: skein_glm-0.7.0.tar.gz
  • Upload date:
  • Size: 248.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for skein_glm-0.7.0.tar.gz
Algorithm Hash digest
SHA256 a0f7647f632e2f1f21d1683ed0bd2bd7156272c042129608d099255f6e58149f
MD5 be30e6e14ff3e59aac78c40dca796ae7
BLAKE2b-256 8e05ca173c21c3050d51845d3ce920e5fbc64a03c8abb8b0fe80a5addc3dd628

See more details on using hashes here.

Provenance

The following attestation bundles were made for skein_glm-0.7.0.tar.gz:

Publisher: wheels.yml on dvillacis/skein

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file skein_glm-0.7.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: skein_glm-0.7.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.7 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for skein_glm-0.7.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 acf093838e449e187bf561aa2ce59b38d12c5a68e0eb21b802cdad52ac36b317
MD5 c7c2cb3c71e67fc7fe1a65457620a9e4
BLAKE2b-256 151733906acfbe1681924804e085fdb1f2baf56dadd5b0122021f09f9f8549c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for skein_glm-0.7.0-cp310-abi3-win_amd64.whl:

Publisher: wheels.yml on dvillacis/skein

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file skein_glm-0.7.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for skein_glm-0.7.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bb21bf17367e14f3e428aa12fc5e9bdd1109b3db103220505d57d77d14bd9902
MD5 00d2d98505d440da27ac743b6f6aad27
BLAKE2b-256 76f77a9a6736f65d70c0c12d55a728c908d77eb0d9c529281a5b336cb768938f

See more details on using hashes here.

Provenance

The following attestation bundles were made for skein_glm-0.7.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: wheels.yml on dvillacis/skein

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file skein_glm-0.7.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for skein_glm-0.7.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8c7a116cf0223e633934c6b7e0e475c5fdf6f92a2a8f8e11c8099c6b7b8e08aa
MD5 6db11cf7e89ff48616a28c72e478608f
BLAKE2b-256 10509db9a39db25a230496d5f8e2b5a741c6abb286de92a924786826caa3a193

See more details on using hashes here.

Provenance

The following attestation bundles were made for skein_glm-0.7.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: wheels.yml on dvillacis/skein

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page