Skip to main content

Spam/ham classifier with an MLOps-style training pipeline.

Project description

Spam classifier

This project demonstrates how to package and train a simple spam/ham classifier with MLOps practices. It is designed for students learning how to structure ML code into modules, build training pipelines, configure via YAML, and add tests and CI.

Project structure

  • spam_classifier/ — package code (pipeline, training, inference)
  • data/ — raw and processed datasets
  • config.yaml — pipeline and training configuration
  • tests/ — pytest suite (unit + quality)
  • .github/workflows/ci.yml — GitHub Actions CI

Setup (uv)

uv venv --seed --python 3.13
uv pip install -e ".[dev]"

Minimum supported Python version is 3.11. If you prefer venv, you can still use it, but the project CI and Makefile expect uv.

Data

Download and prepare the dataset:

make download_data
make process_data

make process_data builds data/processed/train.csv and data/processed/test.csv. The holdout split is controlled by:

  • data.test_size in config.yaml (default 0.1)
  • training.use_holdout (True/False)

Training

Train with cross-validation and optional holdout evaluation:

make train

Training behavior is controlled in config.yaml:

  • training.cv_folds — number of CV folds
  • training.metrics — metrics to log (accuracy/precision/recall/f1/roc_auc)
  • training.use_holdout — evaluate on test.csv if True
  • training.run_validation — run CV if True

Versioned artifacts

Package version is stored in spam_classifier/_VERSION. Model and log filenames include this version:

  • Model: spam_classifier/models/spam_classifier_vX.Y.Z.pkl
  • Logs: spam_classifier/logs/logs_X.Y.Z.log

Inference

Install from PyPI

uv pip install spam-classifier

CLI usage

Single message:

uv run python -m spam_classifier.predict "Free prize! Call now"

Batch inference from file (one message per line):

uv run python -m spam_classifier.predict data/processed/test.csv -o results/preds.csv

Options:

  • -o/--output — output CSV path (default: project root)
  • --no-message — exclude message text from output CSV
  • --model-path — path to a trained .pkl model (overrides default)

If you installed the package from PyPI, you must train a model or pass --model-path because no weights are bundled with the package by default. Pretrained weights are attached to the GitHub Release assets for each version.

Python usage

from spam_classifier.predict import load_model, predict_message

model = load_model("/path/to/model.pkl")
print(predict_message("Free prize! Call now", model))

If you have activated the virtual environment, you can omit uv run and call python directly.

Tests

Run full test suite:

uv run pytest tests

Quality tests (require trained model and holdout data):

uv run pytest -m quality

If you have activated the virtual environment, you can omit uv run for pytest as well.

CI

GitHub Actions runs on PRs to main and develop:

  • black --check
  • flake8
  • mypy
  • pytest tests

Pre-commit

Install and run pre-commit hooks:

pre-commit install
pre-commit run --all-files

Hooks included: black, flake8, mypy.

Publishing

TestPyPI (manual)

  1. Update spam_classifier/_VERSION
  2. Create a GitHub Actions run:
    • Go to Actions → Publish → Run workflow
    • Select testpypi
  3. The package is built and published to TestPyPI

PyPI (release)

  1. Update spam_classifier/_VERSION
  2. Create a GitHub Release (tag should match the version, e.g. v0.1.0)
  3. The Publish workflow will build and upload to PyPI

Trusted publishing

This project uses GitHub Actions OIDC (trusted publishing). You must configure the trusted publisher on PyPI and TestPyPI to allow the Publish workflow from this repository to upload packages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spam_classifier-0.2.1.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spam_classifier-0.2.1-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file spam_classifier-0.2.1.tar.gz.

File metadata

  • Download URL: spam_classifier-0.2.1.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spam_classifier-0.2.1.tar.gz
Algorithm Hash digest
SHA256 778570d40c79dbd1e2f4870bd8c7818fdf52c39e8738fc9d8a2976fa86d25ab3
MD5 4dcf91b997ff69ee08b6157d45a3387a
BLAKE2b-256 a298e4c9aa85786bbed3ab1f941cb151db52a026aeb3d0e43a8e053e5ba54b31

See more details on using hashes here.

Provenance

The following attestation bundles were made for spam_classifier-0.2.1.tar.gz:

Publisher: publish.yml on Emilien-mipt/spam_classifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spam_classifier-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for spam_classifier-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 19e114f9cbd81659626b46b24d94bc98cb60c5b039814169e3dc6f4d2352a5ce
MD5 d1de4b890c0f0826fc5e9ee59f3317d1
BLAKE2b-256 1cead262165605d880389a1e8ab816942dc7849e09f20b0f2b9d828aa8665d84

See more details on using hashes here.

Provenance

The following attestation bundles were made for spam_classifier-0.2.1-py3-none-any.whl:

Publisher: publish.yml on Emilien-mipt/spam_classifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page