Weighted structured nonconvex sparse models (Python + Rust)
Project description
skein
Weighted structured nonconvex sparse models. Rust core + Python API.
Documentation: the docs site has the full conceptual reference (penalties, datafits, weights, backends), porting guides for
glmnet/ncvreg/grpreg, worked examples, and an auto-generated API reference. Hosted on Read the Docs once the project is connected (config in.readthedocs.yaml); preview locally withmkdocs serve. CI builds it--stricton every PR.
skein targets a niche that's well-served in R (grpreg, ncvreg) but
missing in Python at production quality: nonconvex group-structured
penalties (group MCP, group SCAD, sparse-group nonconvex) with first-class
support for weights along three axes — per-sample, per-feature, and
per-group.
Status
v0.1 development. Core algorithms and the headline GLM family are in place; design-matrix backends (sparse, mmap, chunked) are next. See ROADMAP.md for the full plan.
Done so far:
- Solvers — production CD core (path solver, strong rule + KKT verification, gap-safe screening, Anderson acceleration); group block-CD with LLA outer loop for nonconvex group penalties; Rayon-parallel group sweeps; operator-norm Lipschitz via power iteration.
- Datafits — least squares, binomial logistic, Poisson (log link),
Cox PH (Breslow ties). All glued together by a
GlmDatafittrait that exposes a weighted-LS surrogate; the M1/M2 inner solvers absorb every GLM unchanged. - Penalties — MCP, SCAD, group lasso, group MCP, sparse-group lasso, sparse-group MCP. Per-feature and per-group weights honored throughout.
- Python — sklearn-compatible estimators for every (datafit ×
penalty) combination; type stubs; warm-started λ-paths; standardization
with original-scale
coef_/intercept_recovery (dense backend). - Graphical models — sparse precision matrix estimation
(
GraphicalLasso/GraphicalMCP/GraphicalSCAD) and joint estimation acrossKrelated populations (JointGraphicalLasso/JointGraphicalMCP, Danaher–Wang–Witten 2014 group form via ADMM), with EBIC tuning. Nonconvex penalties on edges close the shrinkage-bias gap thatsklearn.covariance.GraphicalLassoand R'sglasso/qgraph/bootnetleave open.
M8 (Distribution & DX) is done: CI + cibuildwheel + Read the Docs +
25-page mkdocs site (concepts + R-porting + extending + examples + API
ref) + R numerical regression suite vs glmnet/ncvreg/grpreg + stable
Rust API contract. The library is pip install-able once published,
documented end-to-end, and pinned against R reference fits so we don't
silently drift.
Coming next: algorithmic features — M5.x adaptive weights and stability selection are the next high-value milestones; both leverage the existing per-feature/per-group weight axes that are already wired through every solver.
Layout
crates/skein-core/ pure Rust: traits + algorithms (no Python)
crates/skein-py/ PyO3 bindings (cdylib → skein_glm._core)
python/skein/ sklearn-compatible estimators + ABCs for extensions
tests/ pytest smoke tests
benches/ criterion (Rust) + asv (Python)
The Rust traits (DesignMatrix, Datafit, GlmDatafit, Penalty,
GroupPenalty) and their Python ABC mirrors (skein.penalties.Penalty,
etc.) are the extension surface for downstream per-paper projects.
Quick start
import numpy as np
from skein import MCPPathRegressor, LogisticGroupMCPPathRegressor, CoxMCPRegressor
# Nonconvex sparse least squares with a λ-path.
rng = np.random.default_rng(0)
n, p = 200, 50
X = rng.standard_normal((n, p))
y = X[:, :3] @ np.array([1.5, -2.0, 0.8]) + 0.1 * rng.standard_normal(n)
model = MCPPathRegressor(gamma=3.0, n_lambdas=50, standardize=True).fit(X, y)
print(model.coefs_[-1, :5], model.intercepts_[-1])
# Logistic + group MCP via LLA, with sklearn-style predict/predict_proba.
groups = np.repeat(np.arange(p // 5), 5) # 5 features per group
y_bin = (X[:, :3].sum(axis=1) > 0).astype(float)
clf = LogisticGroupMCPPathRegressor(groups=groups, gamma=3.0, n_lambdas=20).fit(X, y_bin)
proba = clf.predict_proba(X) # shape (n, n_lambdas)
# Cox PH with right-censored survival data.
time = rng.exponential(1.0 / np.exp(X[:, :3].sum(axis=1)))
event = rng.uniform(size=n) < 0.7
cox = CoxMCPRegressor(lambda_=0.01, gamma=3.0).fit(X, time, event.astype(float))
risk = cox.predict(X) # prognostic index η
Every regressor follows the same (datafit) × (penalty) × ({,Path}Regressor)
naming scheme. The path variants warm-start across λ; their coefs_ /
intercepts_ (where applicable) are 2D arrays indexed by λ.
Performance
skein is benchmarked against sklearn / skglm / celer / glmnet / ncvreg
on shared λ-grids via the harness under benches/. Headline numbers
(Apple M1, 16 GB; median of N timed trials after a warm-up):
| scenario | size | skein | next-fastest comparator |
|---|---|---|---|
| Lasso LS — deep | medium (n=10k, p=1k) | 1.17 s | sklearn 0.125 s |
| Lasso LS — sparse | medium | 0.78 s | sklearn 0.099 s |
| MCP LS — deep | medium | 1.37 s | skglm 3.35 s |
| MCP LS — sparse | medium | 0.75 s | ncvreg 1.17 s |
| MCP LS — deep | large (n=100k, p=10k) | 510 s | skglm 666 s |
| MCP LS — sparse | large | 497 s | skglm 702 s |
| SCAD LS — deep | medium | 1.78 s | ncvreg 7.99 s |
| SCAD LS — sparse | medium | 0.90 s | ncvreg 1.86 s |
skein is the fastest on every nonconvex row across every size; on
convex lasso/LS the sklearn Cython lasso_path remains the floor at
~8–9× faster on the medium bench. See
docs/benchmarks/mcp_ls.md and
docs/benchmarks/scad_ls.md for the
full nonconvex write-ups (correctness matrices + methodology +
per-size tables) and
docs/perf/lasso_ls_profile.md for
the lasso/LS profiling work that drove M10.
Reproduce with python benches/run.py --scenarios mcp_ls mcp_ls_sparse --sizes small,medium.
Build
# Rust core only (fast iteration on algorithms)
cargo test -p skein-core
# Full Python package (requires maturin in your env)
maturin develop --release
pytest
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skein_glm-0.7.0.tar.gz.
File metadata
- Download URL: skein_glm-0.7.0.tar.gz
- Upload date:
- Size: 248.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0f7647f632e2f1f21d1683ed0bd2bd7156272c042129608d099255f6e58149f
|
|
| MD5 |
be30e6e14ff3e59aac78c40dca796ae7
|
|
| BLAKE2b-256 |
8e05ca173c21c3050d51845d3ce920e5fbc64a03c8abb8b0fe80a5addc3dd628
|
Provenance
The following attestation bundles were made for skein_glm-0.7.0.tar.gz:
Publisher:
wheels.yml on dvillacis/skein
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
skein_glm-0.7.0.tar.gz -
Subject digest:
a0f7647f632e2f1f21d1683ed0bd2bd7156272c042129608d099255f6e58149f - Sigstore transparency entry: 1523784476
- Sigstore integration time:
-
Permalink:
dvillacis/skein@586b8598b33cdb531f32851d2df0999a29f4ecd5 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/dvillacis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@586b8598b33cdb531f32851d2df0999a29f4ecd5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file skein_glm-0.7.0-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: skein_glm-0.7.0-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
acf093838e449e187bf561aa2ce59b38d12c5a68e0eb21b802cdad52ac36b317
|
|
| MD5 |
c7c2cb3c71e67fc7fe1a65457620a9e4
|
|
| BLAKE2b-256 |
151733906acfbe1681924804e085fdb1f2baf56dadd5b0122021f09f9f8549c8
|
Provenance
The following attestation bundles were made for skein_glm-0.7.0-cp310-abi3-win_amd64.whl:
Publisher:
wheels.yml on dvillacis/skein
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
skein_glm-0.7.0-cp310-abi3-win_amd64.whl -
Subject digest:
acf093838e449e187bf561aa2ce59b38d12c5a68e0eb21b802cdad52ac36b317 - Sigstore transparency entry: 1523784579
- Sigstore integration time:
-
Permalink:
dvillacis/skein@586b8598b33cdb531f32851d2df0999a29f4ecd5 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/dvillacis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@586b8598b33cdb531f32851d2df0999a29f4ecd5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file skein_glm-0.7.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: skein_glm-0.7.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb21bf17367e14f3e428aa12fc5e9bdd1109b3db103220505d57d77d14bd9902
|
|
| MD5 |
00d2d98505d440da27ac743b6f6aad27
|
|
| BLAKE2b-256 |
76f77a9a6736f65d70c0c12d55a728c908d77eb0d9c529281a5b336cb768938f
|
Provenance
The following attestation bundles were made for skein_glm-0.7.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
wheels.yml on dvillacis/skein
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
skein_glm-0.7.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
bb21bf17367e14f3e428aa12fc5e9bdd1109b3db103220505d57d77d14bd9902 - Sigstore transparency entry: 1523784547
- Sigstore integration time:
-
Permalink:
dvillacis/skein@586b8598b33cdb531f32851d2df0999a29f4ecd5 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/dvillacis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@586b8598b33cdb531f32851d2df0999a29f4ecd5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file skein_glm-0.7.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: skein_glm-0.7.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c7a116cf0223e633934c6b7e0e475c5fdf6f92a2a8f8e11c8099c6b7b8e08aa
|
|
| MD5 |
6db11cf7e89ff48616a28c72e478608f
|
|
| BLAKE2b-256 |
10509db9a39db25a230496d5f8e2b5a741c6abb286de92a924786826caa3a193
|
Provenance
The following attestation bundles were made for skein_glm-0.7.0-cp310-abi3-macosx_11_0_arm64.whl:
Publisher:
wheels.yml on dvillacis/skein
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
skein_glm-0.7.0-cp310-abi3-macosx_11_0_arm64.whl -
Subject digest:
8c7a116cf0223e633934c6b7e0e475c5fdf6f92a2a8f8e11c8099c6b7b8e08aa - Sigstore transparency entry: 1523784513
- Sigstore integration time:
-
Permalink:
dvillacis/skein@586b8598b33cdb531f32851d2df0999a29f4ecd5 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/dvillacis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
wheels.yml@586b8598b33cdb531f32851d2df0999a29f4ecd5 -
Trigger Event:
push
-
Statement type: