Spam/ham classifier with an MLOps-style training pipeline.
Project description
Spam classifier
This project demonstrates how to package and train a simple spam/ham classifier with MLOps practices. It is designed for students learning how to structure ML code into modules, build training pipelines, configure via YAML, and add tests and CI.
Project structure
spam_classifier/— package code (pipeline, training, inference)data/— raw and processed datasetsconfig.yaml— pipeline and training configurationtests/— pytest suite (unit + quality).github/workflows/ci.yml— GitHub Actions CI
Setup (uv)
uv venv --seed --python 3.13
uv pip install -e ".[dev]"
Minimum supported Python version is 3.11. If you prefer venv, you can still use it, but the project CI and Makefile expect uv.
Data
Download and prepare the dataset:
make download_data
make process_data
make process_data builds data/processed/train.csv and data/processed/test.csv. The holdout split is controlled by:
data.test_sizeinconfig.yaml(default 0.1)training.use_holdout(True/False)
Training
Train with cross-validation and optional holdout evaluation:
make train
Training behavior is controlled in config.yaml:
training.cv_folds— number of CV foldstraining.metrics— metrics to log (accuracy/precision/recall/f1/roc_auc)training.use_holdout— evaluate ontest.csvif Truetraining.run_validation— run CV if True
Versioned artifacts
Package version is stored in spam_classifier/_VERSION. Model and log filenames include this version:
- Model:
spam_classifier/models/spam_classifier_vX.Y.Z.pkl - Logs:
spam_classifier/logs/logs_X.Y.Z.log
Inference
Install from PyPI
uv pip install spam-classifier
CLI usage
Single message:
uv run python -m spam_classifier.predict "Free prize! Call now"
Batch inference from file (one message per line):
uv run python -m spam_classifier.predict data/processed/test.csv -o results/preds.csv
Options:
-o/--output— output CSV path (default: project root)--no-message— exclude message text from output CSV--model-path— path to a trained.pklmodel (overrides default)
If you installed the package from PyPI, you must train a model or pass --model-path
because no weights are bundled with the package by default. Pretrained weights are
attached to the GitHub Release assets for each version.
Python usage
from spam_classifier.predict import load_model, predict_message
model = load_model("/path/to/model.pkl")
print(predict_message("Free prize! Call now", model))
If you have activated the virtual environment, you can omit uv run and call python directly.
Tests
Run full test suite:
uv run pytest tests
Quality tests (require trained model and holdout data):
uv run pytest -m quality
If you have activated the virtual environment, you can omit uv run for pytest as well.
CI
GitHub Actions runs on PRs to main and develop:
black --checkflake8mypypytest tests
Pre-commit
Install and run pre-commit hooks:
pre-commit install
pre-commit run --all-files
Hooks included: black, flake8, mypy.
Publishing
TestPyPI (manual)
- Update
spam_classifier/_VERSION - Create a GitHub Actions run:
- Go to Actions → Publish → Run workflow
- Select
testpypi
- The package is built and published to TestPyPI
PyPI (release)
- Update
spam_classifier/_VERSION - Create a GitHub Release (tag should match the version, e.g.
v0.1.0) - The Publish workflow will build and upload to PyPI
Trusted publishing
This project uses GitHub Actions OIDC (trusted publishing). You must configure
the trusted publisher on PyPI and TestPyPI to allow the Publish workflow
from this repository to upload packages.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spam_classifier-0.2.1.tar.gz.
File metadata
- Download URL: spam_classifier-0.2.1.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
778570d40c79dbd1e2f4870bd8c7818fdf52c39e8738fc9d8a2976fa86d25ab3
|
|
| MD5 |
4dcf91b997ff69ee08b6157d45a3387a
|
|
| BLAKE2b-256 |
a298e4c9aa85786bbed3ab1f941cb151db52a026aeb3d0e43a8e053e5ba54b31
|
Provenance
The following attestation bundles were made for spam_classifier-0.2.1.tar.gz:
Publisher:
publish.yml on Emilien-mipt/spam_classifier
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
spam_classifier-0.2.1.tar.gz -
Subject digest:
778570d40c79dbd1e2f4870bd8c7818fdf52c39e8738fc9d8a2976fa86d25ab3 - Sigstore transparency entry: 872338108
- Sigstore integration time:
-
Permalink:
Emilien-mipt/spam_classifier@a409dcc50b7e25b94ccb737c5aee87b15523fb09 -
Branch / Tag:
refs/tags/0.2.1 - Owner: https://github.com/Emilien-mipt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a409dcc50b7e25b94ccb737c5aee87b15523fb09 -
Trigger Event:
release
-
Statement type:
File details
Details for the file spam_classifier-0.2.1-py3-none-any.whl.
File metadata
- Download URL: spam_classifier-0.2.1-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19e114f9cbd81659626b46b24d94bc98cb60c5b039814169e3dc6f4d2352a5ce
|
|
| MD5 |
d1de4b890c0f0826fc5e9ee59f3317d1
|
|
| BLAKE2b-256 |
1cead262165605d880389a1e8ab816942dc7849e09f20b0f2b9d828aa8665d84
|
Provenance
The following attestation bundles were made for spam_classifier-0.2.1-py3-none-any.whl:
Publisher:
publish.yml on Emilien-mipt/spam_classifier
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
spam_classifier-0.2.1-py3-none-any.whl -
Subject digest:
19e114f9cbd81659626b46b24d94bc98cb60c5b039814169e3dc6f4d2352a5ce - Sigstore transparency entry: 872338119
- Sigstore integration time:
-
Permalink:
Emilien-mipt/spam_classifier@a409dcc50b7e25b94ccb737c5aee87b15523fb09 -
Branch / Tag:
refs/tags/0.2.1 - Owner: https://github.com/Emilien-mipt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a409dcc50b7e25b94ccb737c5aee87b15523fb09 -
Trigger Event:
release
-
Statement type: