SciTeX ML — machine learning, classification, training utilities
Project description
scitex-ml
Reproducible classical and deep machine-learning utilities for scientific research.
Full Documentation · pip install scitex-ml
Problem and Solution
| # | Problem | Solution |
|---|---|---|
| 1 | Boilerplate around scikit-learn — every paper re-implements Classifier factories, train/eval loops, classification reports, ROC/PR plots. |
Classifier + ClassificationReporter — thin factory over scikit-learn estimators that snaps directly into a reporter for cross-validation aware metrics, confusion matrices, and figure export. |
| 2 | Time-series CV done by hand — researchers re-derive blocking / sliding-window / calendar splitters per project, often with off-by-one bugs. | Time-series CV splitters — TimeSeriesStratifiedSplit, TimeSeriesBlockingSplit, TimeSeriesSlidingWindowSplit, TimeSeriesCalendarSplit ship with consistent APIs and tested edge cases. |
| 3 | Training-loop ergonomics — EarlyStopping, learning-curve logging, optimiser shortcuts, multi-task losses are all glue code that drifts between repos. |
First-class training utilities — EarlyStopping, LearningCurveLogger, MultiTaskLoss, get_optimizer / set_optimizer, vendored Ranger. |
| 4 | Heavy ML deps mixed with LLM SDKs — installing one pulls all of scikit-learn, torch, openai, anthropic. |
Split package — generative-AI lives in scitex-genai; scitex-ml keeps the classical / deep-ML stack and nothing else. |
Installation
pip install scitex-ml # core
pip install scitex-ml[heavy] # + torch / catboost / optuna / pytorch_pretrained_vit
pip install scitex-ml[mcp] # + fastmcp
pip install scitex-ml[all] # everything
Through the umbrella: pip install scitex[ml]. Requires Python ≥ 3.10.
Quick Start
import scitex_ml
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
X, y = load_iris(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, random_state=0, stratify=y)
# Classifier — factory over scikit-learn estimators.
clf = scitex_ml.Classifier("LogisticRegression")
clf.fit(X_tr, y_tr)
print(f"test accuracy: {clf.score(X_te, y_te):.3f}")
# ClassificationReporter — metric tracking + figure export.
reporter = scitex_ml.ClassificationReporter(save_dir="./results")
reporter.calc_metrics(y_te, clf.predict(X_te), clf.predict_proba(X_te))
reporter.summarize()
reporter.save()
For a runnable walk-through see examples/01_classification.ipynb.
Demo
A complete classification + reporting walk-through (Iris, train/test
split, Classifier("LogisticRegression"), ClassificationReporter
metric persistence) lives in
examples/01_classification.ipynb.
flowchart LR
Data[load_iris<br/>train_test_split] --> Clf[scitex_ml.Classifier"LogisticRegression"]
Clf -->|fit / score| Pred[y_pred / y_proba]
Pred --> Reporter[scitex_ml.ClassificationReporter]
Reporter -->|calc_metrics| Metrics[bacc · ROC-AUC · Conf-Mat]
Reporter -->|save| Artefacts[results/<br/>metrics.csv · roc.png · pr.png]
A second examples/example_classifier.py runs the same flow as a script
so it can be wired into tests/examples/test_example_classifier.py for
CI smoke coverage.
Architecture
scitex-ml sits in the middle layer of the SciTeX ecosystem:
scitex-python (umbrella)
└── scitex.ml ── thin sys.modules-aliasing shim
└── scitex_ml (this package)
├── classification/ Classifier, ClassificationReporter,
│ time-series CV splitters
├── training/ EarlyStopping, LearningCurveLogger
├── loss/ MultiTaskLoss + regularisers
├── optim/ get_optimizer / set_optimizer + Ranger
├── metrics/ calc_bacc, calc_conf_mat, calc_roc_auc
├── clustering/ PCA + UMAP wrappers
├── feature_extraction/ ViT embeddings
├── feature_selection/ univariate / multivariate
├── plt/ ROC / PR / learning-curve / conf-mat plots
├── sampling/ undersampling helpers
├── sklearn/ scikit-learn integration helpers
└── sk/ sktime compatibility
Cross-package dependencies are minimal: scitex-logging, scitex-io,
scitex-plt, scitex-repro, scitex-types. Heavy deps (torch,
catboost, optuna, pytorch_pretrained_vit) live behind the [heavy]
extra and are gated by _AVAILABLE flags in the source so a pip install scitex-ml install without [heavy] still imports cleanly.
4 Interfaces
Python API ⭐⭐⭐ (primary)
import scitex_ml
# Time-series cross-validation
from scitex_ml.classification import TimeSeriesStratifiedSplit
splitter = TimeSeriesStratifiedSplit(n_splits=5)
# Training utilities
stopper = scitex_ml.EarlyStopping(patience=10, direction="minimize")
logger = scitex_ml.LearningCurveLogger()
# Multi-task loss + optimiser
mtl = scitex_ml.MultiTaskLoss(are_regression=[False, False])
optimizer = scitex_ml.set_optimizer(model, "adam", lr=1e-3)
# Metrics
result = scitex_ml.metrics.calc_bacc(y_true, y_pred)
cm = scitex_ml.metrics.calc_conf_mat(y_true, y_pred)
CLI ⭐ — none
scitex-ml ships no dedicated CLI. ML workflows are composed in Python and run via the umbrella scitex CLI / @scitex.session decorator.
MCP ⭐ — none
No MCP server in this package. The umbrella scitex CLI surfaces ML-adjacent MCP tools (e.g. scitex stats, scitex plt).
Skills ⭐⭐
Skill index for AI agents lives at src/scitex_ml/_skills/scitex-ml/SKILL.md. Sub-skills cover classification, training, loss, optim, clustering, metrics, sampling, feature-selection.
Part of SciTeX
scitex-ml is part of SciTeX. Install via the umbrella with pip install scitex[ml] to use as scitex.ml (Python).
import scitex
scitex.ml.Classifier # same object as scitex_ml.Classifier
scitex.ml.classification.TimeSeriesStratifiedSplit # deep paths resolve via the umbrella shim
scitex.ml delegates to scitex_ml — they share the same API.
The SciTeX system follows the Four Freedoms for Research below, inspired by the Free Software Definition:
Four Freedoms for Research
- The freedom to run your research anywhere — your machine, your terms.
- The freedom to study how every step works — from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scitex_ml-0.1.0.tar.gz.
File metadata
- Download URL: scitex_ml-0.1.0.tar.gz
- Upload date:
- Size: 7.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e9ba0eba5775d5cce42d63dd383e523a33649e8f2c0125a672476d1c183ff24
|
|
| MD5 |
7d12feb6fea2570d5ab46b0b533f8c6b
|
|
| BLAKE2b-256 |
48042c10ec04babae0e7e48fe50608920592b8736f554c7dbc6f82b9020568cb
|
Provenance
The following attestation bundles were made for scitex_ml-0.1.0.tar.gz:
Publisher:
publish-pypi.yml on ywatanabe1989/scitex-ml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_ml-0.1.0.tar.gz -
Subject digest:
9e9ba0eba5775d5cce42d63dd383e523a33649e8f2c0125a672476d1c183ff24 - Sigstore transparency entry: 1462426449
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-ml@9b8e5471b69ef04077d6b773e78c1f1af828ee25 -
Branch / Tag:
refs/heads/develop - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@9b8e5471b69ef04077d6b773e78c1f1af828ee25 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file scitex_ml-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scitex_ml-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0076c609b7218630e85561267f8e0b36d4da002a98a53a1e71ce34edb5228d6a
|
|
| MD5 |
36392caa280178706d6054be249dece7
|
|
| BLAKE2b-256 |
1cab9f7d6769430c2677c16c8c755329e9329a18c87a7cd601c8c3bd27233679
|
Provenance
The following attestation bundles were made for scitex_ml-0.1.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on ywatanabe1989/scitex-ml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_ml-0.1.0-py3-none-any.whl -
Subject digest:
0076c609b7218630e85561267f8e0b36d4da002a98a53a1e71ce34edb5228d6a - Sigstore transparency entry: 1462426463
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-ml@9b8e5471b69ef04077d6b773e78c1f1af828ee25 -
Branch / Tag:
refs/heads/develop - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@9b8e5471b69ef04077d6b773e78c1f1af828ee25 -
Trigger Event:
workflow_dispatch
-
Statement type: