Skip to main content

SciTeX ML — machine learning, classification, training utilities

Project description

scitex-ml

SciTeX

Reproducible classical and deep machine-learning utilities for scientific research.

Full Documentation · pip install scitex-ml

PyPI Python Tests Coverage Docs License: AGPL v3


Problem and Solution

# Problem Solution
1 Boilerplate around scikit-learn — every paper re-implements Classifier factories, train/eval loops, classification reports, ROC/PR plots. Classifier + ClassificationReporter — thin factory over scikit-learn estimators that snaps directly into a reporter for cross-validation aware metrics, confusion matrices, and figure export.
2 Time-series CV done by hand — researchers re-derive blocking / sliding-window / calendar splitters per project, often with off-by-one bugs. Time-series CV splittersTimeSeriesStratifiedSplit, TimeSeriesBlockingSplit, TimeSeriesSlidingWindowSplit, TimeSeriesCalendarSplit ship with consistent APIs and tested edge cases.
3 Training-loop ergonomicsEarlyStopping, learning-curve logging, optimiser shortcuts, multi-task losses are all glue code that drifts between repos. First-class training utilitiesEarlyStopping, LearningCurveLogger, MultiTaskLoss, get_optimizer / set_optimizer, vendored Ranger.
4 Heavy ML deps mixed with LLM SDKs — installing one pulls all of scikit-learn, torch, openai, anthropic. Split package — generative-AI lives in scitex-genai; scitex-ml keeps the classical / deep-ML stack and nothing else.

Installation

pip install scitex-ml          # core
pip install scitex-ml[heavy]   # + torch / catboost / optuna / pytorch_pretrained_vit
pip install scitex-ml[mcp]     # + fastmcp
pip install scitex-ml[all]     # everything

Through the umbrella: pip install scitex[ml]. Requires Python ≥ 3.10.

Quick Start

import scitex_ml
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True)
X_tr, X_te, y_tr, y_te = train_test_split(X, y, random_state=0, stratify=y)

# Classifier — factory over scikit-learn estimators.
clf = scitex_ml.Classifier("LogisticRegression")
clf.fit(X_tr, y_tr)
print(f"test accuracy: {clf.score(X_te, y_te):.3f}")

# ClassificationReporter — metric tracking + figure export.
reporter = scitex_ml.ClassificationReporter(save_dir="./results")
reporter.calc_metrics(y_te, clf.predict(X_te), clf.predict_proba(X_te))
reporter.summarize()
reporter.save()

For a runnable walk-through see examples/01_classification.ipynb.

Demo

A complete classification + reporting walk-through (Iris, train/test split, Classifier("LogisticRegression"), ClassificationReporter metric persistence) lives in examples/01_classification.ipynb.

flowchart LR
    Data[load_iris<br/>train_test_split] --> Clf[scitex_ml.Classifier&quot;LogisticRegression&quot;]
    Clf -->|fit / score| Pred[y_pred / y_proba]
    Pred --> Reporter[scitex_ml.ClassificationReporter]
    Reporter -->|calc_metrics| Metrics[bacc · ROC-AUC · Conf-Mat]
    Reporter -->|save| Artefacts[results/<br/>metrics.csv · roc.png · pr.png]

A second examples/example_classifier.py runs the same flow as a script so it can be wired into tests/examples/test_example_classifier.py for CI smoke coverage.

Architecture

scitex-ml sits in the middle layer of the SciTeX ecosystem:

scitex-python (umbrella)
    └── scitex.ml ── thin sys.modules-aliasing shim
                     └── scitex_ml (this package)
                           ├── classification/   Classifier, ClassificationReporter,
                           │                     time-series CV splitters
                           ├── training/         EarlyStopping, LearningCurveLogger
                           ├── loss/             MultiTaskLoss + regularisers
                           ├── optim/            get_optimizer / set_optimizer + Ranger
                           ├── metrics/          calc_bacc, calc_conf_mat, calc_roc_auc
                           ├── clustering/       PCA + UMAP wrappers
                           ├── feature_extraction/  ViT embeddings
                           ├── feature_selection/   univariate / multivariate
                           ├── plt/              ROC / PR / learning-curve / conf-mat plots
                           ├── sampling/         undersampling helpers
                           ├── sklearn/          scikit-learn integration helpers
                           └── sk/               sktime compatibility

Cross-package dependencies are minimal: scitex-logging, scitex-io, scitex-plt, scitex-repro, scitex-types. Heavy deps (torch, catboost, optuna, pytorch_pretrained_vit) live behind the [heavy] extra and are gated by _AVAILABLE flags in the source so a pip install scitex-ml install without [heavy] still imports cleanly.

4 Interfaces

Python API ⭐⭐⭐ (primary)
import scitex_ml

# Time-series cross-validation
from scitex_ml.classification import TimeSeriesStratifiedSplit
splitter = TimeSeriesStratifiedSplit(n_splits=5)

# Training utilities
stopper = scitex_ml.EarlyStopping(patience=10, direction="minimize")
logger = scitex_ml.LearningCurveLogger()

# Multi-task loss + optimiser
mtl = scitex_ml.MultiTaskLoss(are_regression=[False, False])
optimizer = scitex_ml.set_optimizer(model, "adam", lr=1e-3)

# Metrics
result = scitex_ml.metrics.calc_bacc(y_true, y_pred)
cm = scitex_ml.metrics.calc_conf_mat(y_true, y_pred)

Full API reference

CLI ⭐ — none

scitex-ml ships no dedicated CLI. ML workflows are composed in Python and run via the umbrella scitex CLI / @scitex.session decorator.

MCP ⭐ — none

No MCP server in this package. The umbrella scitex CLI surfaces ML-adjacent MCP tools (e.g. scitex stats, scitex plt).

Skills ⭐⭐

Skill index for AI agents lives at src/scitex_ml/_skills/scitex-ml/SKILL.md. Sub-skills cover classification, training, loss, optim, clustering, metrics, sampling, feature-selection.

Full skills directory

Part of SciTeX

scitex-ml is part of SciTeX. Install via the umbrella with pip install scitex[ml] to use as scitex.ml (Python).

import scitex

scitex.ml.Classifier  # same object as scitex_ml.Classifier
scitex.ml.classification.TimeSeriesStratifiedSplit  # deep paths resolve via the umbrella shim

scitex.ml delegates to scitex_ml — they share the same API.

The SciTeX system follows the Four Freedoms for Research below, inspired by the Free Software Definition:

Four Freedoms for Research

  1. The freedom to run your research anywhere — your machine, your terms.
  2. The freedom to study how every step works — from raw data to final manuscript.
  3. The freedom to redistribute your workflows, not just your papers.
  4. The freedom to modify any module and share improvements with the community.

AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.


SciTeX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_ml-0.1.0.tar.gz (7.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex_ml-0.1.0-py3-none-any.whl (7.3 MB view details)

Uploaded Python 3

File details

Details for the file scitex_ml-0.1.0.tar.gz.

File metadata

  • Download URL: scitex_ml-0.1.0.tar.gz
  • Upload date:
  • Size: 7.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_ml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9e9ba0eba5775d5cce42d63dd383e523a33649e8f2c0125a672476d1c183ff24
MD5 7d12feb6fea2570d5ab46b0b533f8c6b
BLAKE2b-256 48042c10ec04babae0e7e48fe50608920592b8736f554c7dbc6f82b9020568cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_ml-0.1.0.tar.gz:

Publisher: publish-pypi.yml on ywatanabe1989/scitex-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scitex_ml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scitex_ml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_ml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0076c609b7218630e85561267f8e0b36d4da002a98a53a1e71ce34edb5228d6a
MD5 36392caa280178706d6054be249dece7
BLAKE2b-256 1cab9f7d6769430c2677c16c8c755329e9329a18c87a7cd601c8c3bd27233679

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_ml-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on ywatanabe1989/scitex-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page