Task-first ML baselines. Run the simplest thing that could work.

These details have not been verified by PyPI

Project description

stepzero

Task-first ML baselines. Run the simplest thing that could work.

Before reaching for XGBoost or a neural net, run stepzero. It fits the simplest sensible model for your task, compares a few alternatives, and tells you whether your baseline is good enough or what to try next.

import stepzero as sz

result = sz.classification(X, y)
print(result)
# ClassificationResult(best='logistic', accuracy=0.960, headroom='low')

print(result.headroom)
# [low] Score of 0.96 with low variance (±0.012). The simple baseline is already
# performing well. Trying a gradient boosted tree is unlikely to offer a meaningful improvement.

Install

pip install stepzero

Requirements: Python 3.10+, numpy, pandas, scikit-learn, scipy.

Tasks

Classification

result = sz.classification(X, y)

result.best_model          # fitted sklearn Pipeline — call .predict(X_new) directly
result.best_model_name     # "logistic" | "tree" | "naive_bayes"
result.scores              # [ModelScore(name, score, metric), ...]
result.feature_importance  # pd.Series sorted by importance
result.headroom            # HeadroomSignal(level, reason)

Methods: logistic regression, decision tree, naive bayes
Metric: accuracy (5-fold stratified CV)

Regression

result = sz.regression(X, y)

result.best_model_name     # "ridge" | "tree"
result.feature_importance  # normalized importances as pd.Series
result.headroom

Methods: ridge, decision tree
Metric: RMSE (5-fold CV)

Forecasting

result = sz.forecasting(series, horizon=12)

result.forecast        # pd.Series with future timestamps as index
result.best_model_name # "seasonal_naive" | "linear_trend"
result.scores          # MAE per model
result.headroom

Methods: seasonal naive, linear trend
Parameters: horizon, freq (optional — inferred from DatetimeIndex), cv_splits
Metric: MAE (time-series CV)

Anomaly Detection

result = sz.anomaly_detection(series)

result.anomalies   # pd.Series[bool], same index as input
result.scores      # raw anomaly scores
result.method      # "zscore" | "iqr"
result.threshold   # auto-determined threshold
result.headroom

Methods: z-score, IQR
Parameters: threshold (optional — auto-set to flag ~5% of points), method
Metric: inter-method agreement

Text Classification

result = sz.text_classification(texts, labels)

result.best_model_name        # "tfidf_logistic" | "tfidf_naive_bayes"
result.top_features_per_class # {"class_0": ["word1", ...], ...}
result.headroom

Methods: TF-IDF + logistic regression, TF-IDF + naive bayes
Metric: accuracy (5-fold stratified CV)

Clustering

result = sz.clustering(X, k_range=(2, 10))

result.best_k    # selected number of clusters
result.labels    # cluster assignment per sample (np.ndarray)
result.centers   # cluster centroids in original feature space
result.scores    # silhouette score per k tried
result.headroom

Methods: k-means
Parameters: k_range
Metric: silhouette score

The headroom signal

Every result has a .headroom attribute:

result.headroom.level   # "low" | "medium" | "high"
result.headroom.reason  # actionable explanation + what to try next
print(result.headroom)
# [medium] CV accuracy of 0.81 ± 0.04. A 19% gap to ceiling remains.
# A gradient boosted tree (e.g., XGBoost or LightGBM) is a natural next step.

low means that the simple model is already doing well; complexity buys little
medium means that meaningful headroom remains; a tuned model may help
high means that the baseline is underperforming; a more complex model is likely worth it

Design philosophy

Task-first, not model-first. You describe the problem; stepzero picks the approach.
Opinionated defaults. Auto-scaling for linear models, missing value imputation, sensible eval.
No false modesty. The models are genuinely simple — logistic regression, decision trees, seasonal naive. No AutoML hidden underneath.
Ready to deploy. result.best_model is a fitted sklearn Pipeline. Call .predict() on new data immediately.
Minimal footprint. Only numpy, pandas, scikit-learn, and scipy. No optional heavy dependencies required for core functionality.

When to use stepzero

✅ Starting a new ML project and want a defensible baseline in 5 minutes
✅ Proving (or disproving) that a simple model is good enough
✅ Teaching or demonstrating ML without the XGBoost-first bias
✅ Kaggle competitions — establish your baseline before tuning

Contributing

Contributions are welcome. Please read CONTRIBUTING.md for the workflow.

In short: branch from develop, open a PR targeting develop. All PRs run the test suite automatically across Python 3.10–3.12.

Reporting issues

Open an issue on GitHub. Include your Python version, stepzero version, and a minimal reproducible example.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stepzero-0.1.0.tar.gz (85.0 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stepzero-0.1.0-py3-none-any.whl (20.4 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file stepzero-0.1.0.tar.gz.

File metadata

Download URL: stepzero-0.1.0.tar.gz
Upload date: Mar 28, 2026
Size: 85.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stepzero-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2dc061c27f071c7ea9b80146db52f3509fe567cb0d1c2d3035c6d205c02fd8da`
MD5	`91ac60328662cf5dea72f0a8d0d29c1e`
BLAKE2b-256	`5e92eeebb0f8d4733d0580321fed0d9c1cbfb131b84c62845a967f73841160be`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stepzero-0.1.0.tar.gz:

Publisher: publish.yml on arnedb/stepzero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stepzero-0.1.0.tar.gz
- Subject digest: 2dc061c27f071c7ea9b80146db52f3509fe567cb0d1c2d3035c6d205c02fd8da
- Sigstore transparency entry: 1191038527
- Sigstore integration time: Mar 28, 2026
Source repository:
- Permalink: arnedb/stepzero@4a73912faf5a1964e170a000c51d9041920e448b
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/arnedb
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4a73912faf5a1964e170a000c51d9041920e448b
- Trigger Event: push

File details

Details for the file stepzero-0.1.0-py3-none-any.whl.

File metadata

Download URL: stepzero-0.1.0-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 20.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stepzero-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8644238d7978bfb930aa285df9b03967aec514871e2c29de9a28ab9095c44e02`
MD5	`fe95988de366e222bbf3045d6b1099ad`
BLAKE2b-256	`050bd45956fc7d1956ddd8bd4818b94fdac053911653c523d627c4512fad2518`

See more details on using hashes here.

Provenance

The following attestation bundles were made for stepzero-0.1.0-py3-none-any.whl:

Publisher: publish.yml on arnedb/stepzero

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: stepzero-0.1.0-py3-none-any.whl
- Subject digest: 8644238d7978bfb930aa285df9b03967aec514871e2c29de9a28ab9095c44e02
- Sigstore transparency entry: 1191038538
- Sigstore integration time: Mar 28, 2026
Source repository:
- Permalink: arnedb/stepzero@4a73912faf5a1964e170a000c51d9041920e448b
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/arnedb
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4a73912faf5a1964e170a000c51d9041920e448b
- Trigger Event: push

stepzero 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

stepzero

Install

Tasks

Classification

Regression

Forecasting

Anomaly Detection

Text Classification

Clustering

The headroom signal

Design philosophy

When to use stepzero

Contributing

Reporting issues

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance