Task-first ML baselines. Run the simplest thing that could work.
Project description
stepzero
Task-first ML baselines. Run the simplest thing that could work.
Before reaching for XGBoost or a neural net, run stepzero. It fits the simplest sensible model for your task, compares a few alternatives, and tells you whether your baseline is good enough or what to try next.
import stepzero as sz
result = sz.classification(X, y)
print(result)
# ClassificationResult(best='logistic', accuracy=0.960, headroom='low')
print(result.headroom)
# [low] Score of 0.96 with low variance (±0.012). The simple baseline is already
# performing well. Trying a gradient boosted tree is unlikely to offer a meaningful improvement.
Install
pip install stepzero
Requirements: Python 3.10+, numpy, pandas, scikit-learn, scipy.
Tasks
Classification
result = sz.classification(X, y)
result.best_model # fitted sklearn Pipeline — call .predict(X_new) directly
result.best_model_name # "logistic" | "tree" | "naive_bayes"
result.scores # [ModelScore(name, score, metric), ...]
result.feature_importance # pd.Series sorted by importance
result.headroom # HeadroomSignal(level, reason)
- Methods: logistic regression, decision tree, naive bayes
- Metric: accuracy (5-fold stratified CV)
Regression
result = sz.regression(X, y)
result.best_model_name # "ridge" | "tree"
result.feature_importance # normalized importances as pd.Series
result.headroom
- Methods: ridge, decision tree
- Metric: RMSE (5-fold CV)
Forecasting
result = sz.forecasting(series, horizon=12)
result.forecast # pd.Series with future timestamps as index
result.best_model_name # "seasonal_naive" | "linear_trend"
result.scores # MAE per model
result.headroom
- Methods: seasonal naive, linear trend
- Parameters:
horizon,freq(optional — inferred from DatetimeIndex),cv_splits - Metric: MAE (time-series CV)
Anomaly Detection
result = sz.anomaly_detection(series)
result.anomalies # pd.Series[bool], same index as input
result.scores # raw anomaly scores
result.method # "zscore" | "iqr"
result.threshold # auto-determined threshold
result.headroom
- Methods: z-score, IQR
- Parameters:
threshold(optional — auto-set to flag ~5% of points),method - Metric: inter-method agreement
Text Classification
result = sz.text_classification(texts, labels)
result.best_model_name # "tfidf_logistic" | "tfidf_naive_bayes"
result.top_features_per_class # {"class_0": ["word1", ...], ...}
result.headroom
- Methods: TF-IDF + logistic regression, TF-IDF + naive bayes
- Metric: accuracy (5-fold stratified CV)
Clustering
result = sz.clustering(X, k_range=(2, 10))
result.best_k # selected number of clusters
result.labels # cluster assignment per sample (np.ndarray)
result.centers # cluster centroids in original feature space
result.scores # silhouette score per k tried
result.headroom
- Methods: k-means
- Parameters:
k_range - Metric: silhouette score
The headroom signal
Every result has a .headroom attribute:
result.headroom.level # "low" | "medium" | "high"
result.headroom.reason # actionable explanation + what to try next
print(result.headroom)
# [medium] CV accuracy of 0.81 ± 0.04. A 19% gap to ceiling remains.
# A gradient boosted tree (e.g., XGBoost or LightGBM) is a natural next step.
- low means that the simple model is already doing well; complexity buys little
- medium means that meaningful headroom remains; a tuned model may help
- high means that the baseline is underperforming; a more complex model is likely worth it
Design philosophy
- Task-first, not model-first. You describe the problem; stepzero picks the approach.
- Opinionated defaults. Auto-scaling for linear models, missing value imputation, sensible eval.
- No false modesty. The models are genuinely simple — logistic regression, decision trees, seasonal naive. No AutoML hidden underneath.
- Ready to deploy.
result.best_modelis a fitted sklearnPipeline. Call.predict()on new data immediately. - Minimal footprint. Only numpy, pandas, scikit-learn, and scipy. No optional heavy dependencies required for core functionality.
When to use stepzero
- ✅ Starting a new ML project and want a defensible baseline in 5 minutes
- ✅ Proving (or disproving) that a simple model is good enough
- ✅ Teaching or demonstrating ML without the XGBoost-first bias
- ✅ Kaggle competitions — establish your baseline before tuning
Contributing
Contributions are welcome. Please read CONTRIBUTING.md for the workflow.
In short: branch from develop, open a PR targeting develop. All PRs run the test suite automatically across Python 3.10–3.12.
Reporting issues
Open an issue on GitHub. Include your Python version, stepzero version, and a minimal reproducible example.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stepzero-0.1.0.tar.gz.
File metadata
- Download URL: stepzero-0.1.0.tar.gz
- Upload date:
- Size: 85.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2dc061c27f071c7ea9b80146db52f3509fe567cb0d1c2d3035c6d205c02fd8da
|
|
| MD5 |
91ac60328662cf5dea72f0a8d0d29c1e
|
|
| BLAKE2b-256 |
5e92eeebb0f8d4733d0580321fed0d9c1cbfb131b84c62845a967f73841160be
|
Provenance
The following attestation bundles were made for stepzero-0.1.0.tar.gz:
Publisher:
publish.yml on arnedb/stepzero
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stepzero-0.1.0.tar.gz -
Subject digest:
2dc061c27f071c7ea9b80146db52f3509fe567cb0d1c2d3035c6d205c02fd8da - Sigstore transparency entry: 1191038527
- Sigstore integration time:
-
Permalink:
arnedb/stepzero@4a73912faf5a1964e170a000c51d9041920e448b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/arnedb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4a73912faf5a1964e170a000c51d9041920e448b -
Trigger Event:
push
-
Statement type:
File details
Details for the file stepzero-0.1.0-py3-none-any.whl.
File metadata
- Download URL: stepzero-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8644238d7978bfb930aa285df9b03967aec514871e2c29de9a28ab9095c44e02
|
|
| MD5 |
fe95988de366e222bbf3045d6b1099ad
|
|
| BLAKE2b-256 |
050bd45956fc7d1956ddd8bd4818b94fdac053911653c523d627c4512fad2518
|
Provenance
The following attestation bundles were made for stepzero-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on arnedb/stepzero
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
stepzero-0.1.0-py3-none-any.whl -
Subject digest:
8644238d7978bfb930aa285df9b03967aec514871e2c29de9a28ab9095c44e02 - Sigstore transparency entry: 1191038538
- Sigstore integration time:
-
Permalink:
arnedb/stepzero@4a73912faf5a1964e170a000c51d9041920e448b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/arnedb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4a73912faf5a1964e170a000c51d9041920e448b -
Trigger Event:
push
-
Statement type: