Spam/ham classifier with an MLOps-style training pipeline.

Project description

Spam classifier

This project demonstrates how to package and train a simple spam/ham classifier with MLOps practices. It is designed for students learning how to structure ML code into modules, build training pipelines, configure via YAML, and add tests and CI.

Project structure

spam_classifier/ — package code (pipeline, training, inference)
data/ — raw and processed datasets
config.yaml — pipeline and training configuration
tests/ — pytest suite (unit + quality)
.github/workflows/ci.yml — GitHub Actions CI

Setup (uv)

uv venv --seed --python 3.13
uv pip install -e ".[dev]"

Minimum supported Python version is 3.11. If you prefer venv, you can still use it, but the project CI and Makefile expect uv.

Data

Download and prepare the dataset:

make download_data
make process_data

make process_data builds data/processed/train.csv and data/processed/test.csv. The holdout split is controlled by:

data.test_size in config.yaml (default 0.1)
training.use_holdout (True/False)

Training

Train with cross-validation and optional holdout evaluation:

make train

Training behavior is controlled in config.yaml:

training.cv_folds — number of CV folds
training.metrics — metrics to log (accuracy/precision/recall/f1/roc_auc)
training.use_holdout — evaluate on test.csv if True
training.run_validation — run CV if True

Versioned artifacts

Package version is stored in spam_classifier/_VERSION. Model and log filenames include this version:

Model: spam_classifier/models/spam_classifier_vX.Y.Z.pkl
Logs: spam_classifier/logs/logs_X.Y.Z.log

Inference

Install from PyPI

uv pip install spam-classifier

CLI usage

Single message:

uv run python -m spam_classifier.predict "Free prize! Call now"

Batch inference from file (one message per line):

uv run python -m spam_classifier.predict data/processed/test.csv -o results/preds.csv

Options:

-o/--output — output CSV path (default: project root)
--no-message — exclude message text from output CSV
--model-path — path to a trained .pkl model (overrides default)

If you installed the package from PyPI, you must train a model or pass --model-path because no weights are bundled with the package by default. Pretrained weights are attached to the GitHub Release assets for each version.

Python usage

from spam_classifier.predict import load_model, predict_message

model = load_model("/path/to/model.pkl")
print(predict_message("Free prize! Call now", model))

If you have activated the virtual environment, you can omit uv run and call python directly.

Tests

Run full test suite:

uv run pytest tests

Quality tests (require trained model and holdout data):

uv run pytest -m quality

If you have activated the virtual environment, you can omit uv run for pytest as well.

CI

GitHub Actions runs on PRs to main and develop:

black --check
flake8
mypy
pytest tests

Pre-commit

Install and run pre-commit hooks:

pre-commit install
pre-commit run --all-files

Hooks included: black, flake8, mypy.

Publishing

TestPyPI (manual)

Update spam_classifier/_VERSION
Create a GitHub Actions run:
- Go to Actions → Publish → Run workflow
- Select testpypi
The package is built and published to TestPyPI

PyPI (release)

Update spam_classifier/_VERSION
Create a GitHub Release (tag should match the version, e.g. v0.1.0)
The Publish workflow will build and upload to PyPI

Trusted publishing

This project uses GitHub Actions OIDC (trusted publishing). You must configure the trusted publisher on PyPI and TestPyPI to allow the Publish workflow from this repository to upload packages.

Project details

Release history Release notifications | RSS feed

This version

0.2.1

Jan 30, 2026

0.2.0

Jan 30, 2026

0.1.0

Jan 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spam_classifier-0.2.1.tar.gz (14.5 kB view details)

Uploaded Jan 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spam_classifier-0.2.1-py3-none-any.whl (12.5 kB view details)

Uploaded Jan 30, 2026 Python 3

File details

Details for the file spam_classifier-0.2.1.tar.gz.

File metadata

Download URL: spam_classifier-0.2.1.tar.gz
Upload date: Jan 30, 2026
Size: 14.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spam_classifier-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`778570d40c79dbd1e2f4870bd8c7818fdf52c39e8738fc9d8a2976fa86d25ab3`
MD5	`4dcf91b997ff69ee08b6157d45a3387a`
BLAKE2b-256	`a298e4c9aa85786bbed3ab1f941cb151db52a026aeb3d0e43a8e053e5ba54b31`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spam_classifier-0.2.1.tar.gz:

Publisher: publish.yml on Emilien-mipt/spam_classifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spam_classifier-0.2.1.tar.gz
- Subject digest: 778570d40c79dbd1e2f4870bd8c7818fdf52c39e8738fc9d8a2976fa86d25ab3
- Sigstore transparency entry: 872338108
- Sigstore integration time: Jan 30, 2026
Source repository:
- Permalink: Emilien-mipt/spam_classifier@a409dcc50b7e25b94ccb737c5aee87b15523fb09
- Branch / Tag: refs/tags/0.2.1
- Owner: https://github.com/Emilien-mipt
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a409dcc50b7e25b94ccb737c5aee87b15523fb09
- Trigger Event: release

File details

Details for the file spam_classifier-0.2.1-py3-none-any.whl.

File metadata

Download URL: spam_classifier-0.2.1-py3-none-any.whl
Upload date: Jan 30, 2026
Size: 12.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spam_classifier-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`19e114f9cbd81659626b46b24d94bc98cb60c5b039814169e3dc6f4d2352a5ce`
MD5	`d1de4b890c0f0826fc5e9ee59f3317d1`
BLAKE2b-256	`1cead262165605d880389a1e8ab816942dc7849e09f20b0f2b9d828aa8665d84`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spam_classifier-0.2.1-py3-none-any.whl:

Publisher: publish.yml on Emilien-mipt/spam_classifier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spam_classifier-0.2.1-py3-none-any.whl
- Subject digest: 19e114f9cbd81659626b46b24d94bc98cb60c5b039814169e3dc6f4d2352a5ce
- Sigstore transparency entry: 872338119
- Sigstore integration time: Jan 30, 2026
Source repository:
- Permalink: Emilien-mipt/spam_classifier@a409dcc50b7e25b94ccb737c5aee87b15523fb09
- Branch / Tag: refs/tags/0.2.1
- Owner: https://github.com/Emilien-mipt
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a409dcc50b7e25b94ccb737c5aee87b15523fb09
- Trigger Event: release

spam-classifier 0.2.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Project description

Spam classifier

Project structure

Setup (uv)

Data

Training

Versioned artifacts

Inference

Install from PyPI

CLI usage

Python usage

Tests

CI

Pre-commit

Publishing

TestPyPI (manual)

PyPI (release)

Trusted publishing

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance