A package to detect code smells in machine learning code

These details have not been verified by PyPI

Project links

Project description

ML Code Smell Detector

A static analysis CLI tool that detects code smells in Python ML projects — without requiring any ML frameworks to be installed. It uses AST-based analysis (via astroid) to identify bad practices across Pandas, NumPy, Scikit-learn, PyTorch, TensorFlow, and Hugging Face Transformers.

Installation
Quick Start
Usage
Use Cases
Output
Detection Scope
Detected Smells
Development & Maintenance
Running Tests
Linting
Documentation
Continuous Integration
Releasing a New Version
Publishing to PyPI
Citation
License

Installation

From PyPI

Install uv if you don't have it:

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Then install the package:

uv pip install ml-code-smell-detector

# or with pip
pip install ml-code-smell-detector

Development Install

git clone https://github.com/KarthikShivasankar/ml_smells_detector
cd ml_smells_detector
uv pip install -e ".[dev]"

Quick Start

# Analyze a single file
ml_smell_detector analyze my_model.py

# Analyze an entire project directory
ml_smell_detector analyze ./my_ml_project/

# Save results to a custom folder
ml_smell_detector analyze ./my_ml_project/ --output-dir reports/

Reports are written to analysis_report.txt and analysis_report.csv in the output directory.

Usage

ml_smell_detector analyze <path> [options]

Argument	Description
`path`	Path to a `.py` file or a directory
`--output-dir DIR`	Directory to write reports to (default: `output/`)
`--ignore DIR [DIR ...]`	Directory names to skip during analysis

Examples

# Analyze a single training script
ml_smell_detector analyze train.py

# Analyze a full project, save to a custom output dir
ml_smell_detector analyze ./src/ --output-dir ./analysis_results/

# Analyze a project but skip test and notebook folders
ml_smell_detector analyze ./project/ --ignore tests notebooks __pycache__

# Analyze a Jupyter notebook export
ml_smell_detector analyze ./exported_notebook.py --output-dir ./nb_report/

Use Cases

1. Pre-commit / PR review check

Catch smells before merging ML code changes:

ml_smell_detector analyze ./ml_code_smell_detector/ --output-dir ./lint_output/ --ignore __pycache__
cat lint_output/analysis_report.txt

2. Audit an existing ML project

Get a full picture of technical debt in a research or production codebase:

ml_smell_detector analyze ./research_project/ --output-dir ./audit/ --ignore .git __pycache__ data

Then open audit/analysis_report.csv in Excel or any spreadsheet tool — each row is a smell with its location, fix, and benefits.

3. Compare model training scripts

Analyze multiple scripts and diff the CSV outputs to track quality improvements over iterations:

ml_smell_detector analyze ./v1/train.py --output-dir ./reports/v1/
ml_smell_detector analyze ./v2/train.py --output-dir ./reports/v2/

4. Integrate into CI/CD

Add to a GitHub Actions workflow (no ML dependencies needed on the runner):

- name: Run ML smell detector
  run: |
    pip install ml-code-smell-detector
    ml_smell_detector analyze ./src/ --output-dir ./smell_report/ --ignore tests
- name: Upload smell report
  uses: actions/upload-artifact@v3
  with:
    name: smell-report
    path: smell_report/

5. Use as a Python library

from ml_code_smell_detector import (
    FrameworkSpecificSmellDetector,
    HuggingFaceSmellDetector,
    ML_SmellDetector,
)

# Run all detectors on a file
for DetectorClass in [FrameworkSpecificSmellDetector, HuggingFaceSmellDetector, ML_SmellDetector]:
    detector = DetectorClass()
    detector.detect_smells("train.py")
    for smell in detector.get_results():
        print(f"[{smell['framework']}] {smell['name']} @ {smell['location']}")
        print(f"  Fix: {smell['fix']}")

Output

Each run produces two report files in the output directory:

`analysis_report.txt`

Human-readable summary grouped by file and detector category:

Analysis results for train.py:

Framework-Specific Smells:
- Missing Random Seed (NumPy)
  Framework: NumPy
  How to fix: Add np.random.seed() at the start of your script
  Benefits: Reproducible experiments
  Location: Line 12

Smell Counts:
  Missing Random Seed: 1
Total smells detected: 1

`analysis_report.csv`

Machine-readable table with columns:

Framework	Smell/Checker Name	How to Fix	Benefits	File Path	Location	Count
NumPy	Missing Random Seed	Add np.random.seed()...	Reproducible...	train.py	Line 12	1

Useful for filtering, sorting, or tracking smell trends over time in a spreadsheet or BI tool.

Detection Scope

The tool analyzes all Python code in a file regardless of nesting depth — module-level code, class bodies, class methods, nested functions, and closures.

Import detection uses prefix matching, so all of the following are recognized:

import sklearn
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler

The same applies to pandas, numpy, torch, tensorflow, and transformers.

Detected Smells

Framework-Specific Smells (`FrameworkSpecificSmellDetector`)

Pandas

Unnecessary iteration (iterrows)
Chain indexing
Inefficient merge operations
Inplace operations
Inefficient DataFrame conversion (.values vs .to_numpy())
Missing data type specifications
Column selection issues
DataFrame mutation during iteration

NumPy

NaN equality checks (use np.isnan())
Missing random seed
Inefficient array creation (missing dtype)
Suboptimal element-wise operations
Dtype inconsistency
Implicit broadcasting risks
Copy/view confusion
Missing axis specification

Scikit-learn

Missing feature scaling
Absence of Pipeline
Missing cross-validation
Inconsistent random_state
Missing verbose mode
Overreliance on accuracy metric
Missing unit tests
Data leakage
Missing exception handling

PyTorch

Missing torch.manual_seed()
Non-deterministic algorithms
DataLoader reproducibility
Missing mask in log operations
Direct model.forward() calls
Missing gradient zeroing
Missing batch normalization
Missing dropout
Missing data augmentation
Missing learning rate scheduler
Missing logging/monitoring
Missing eval mode

TensorFlow

Missing random seed, early stopping, checkpointing, memory management, logging

Hugging Face Smells (`HuggingFaceSmellDetector`)

Model versioning issues
Missing tokenizer and model caching
Inconsistent tokenization settings
Inefficient data loading
Missing distributed training configuration
Missing mixed precision training
Missing gradient accumulation
Missing learning rate scheduling
Missing early stopping

General ML Smells (`ML_SmellDetector`)

Data leakage detection
Magic number usage
Inconsistent feature scaling
Missing cross-validation
Imbalanced dataset handling
Feature selection issues
Overreliance on single metrics
Missing model persistence
Missing reproducibility measures
Inefficient data loading for large datasets
Unused feature detection
Overfitting-prone practices
Missing error handling
Hardcoded file paths
Missing or incomplete documentation

Development & Maintenance

This section is for contributors and maintainers of the package itself.

Prerequisites

uv (package/environment manager)
Python 3.10+ (CI tests 3.10–3.13)

Setup

git clone https://github.com/KarthikShivasankar/ml_smells_detector
cd ml_smells_detector
uv sync --extra dev          # creates .venv and installs the package + dev tools

uv sync --extra dev installs everything in [project.optional-dependencies].dev (pytest, pytest-cov, ruff, flake8, Sphinx, …). Run any tool with uv run <cmd>.

Project layout

ml_code_smell_detector/
  cli.py                 # CLI entry point: arg parsing, file walking, report writing
  utils.py               # astroid-based AST helpers (import node types from astroid.nodes)
  detectors/
    framework_detector.py    # Pandas / NumPy / sklearn / PyTorch / TensorFlow
    huggingface_detector.py  # Hugging Face Transformers
    ml_detector.py           # general ML practices
tests/                   # pytest suite (mirrors the detectors)
docs/source/             # Sphinx documentation sources
.github/workflows/       # CI and release automation
AGENTS.md                # quick command/convention reference for AI agents

The development loop

uv run ruff check . --fix                       # lint + auto-fix (incl. import order)
uv run python -m flake8 ml_code_smell_detector tests
uv run python -m pytest tests/                  # 212 tests
uv build && uvx twine check dist/*              # sanity-check the package

Keep both ruff check and flake8 green, and all tests passing, before committing. CI enforces all of this on every push and PR.

Adding a new smell / detector

Add detection logic to the relevant class in ml_code_smell_detector/detectors/.
Each smell dict must include the keys: name, framework, fix, benefits, location.
Add tests under tests/ for both detection and non-detection cases.
Document the new smell in docs/source/features.rst and the "Detected Smells" list above.
Run the development loop and open a PR.

Coding conventions

Line length: 150 (configured in .flake8 and [tool.ruff]).
Target Python 3.10 — do not use PEP 701 multi-line f-string expressions (newlines inside { ... }); they are a SyntaxError before Python 3.12.
Import astroid node types from astroid.nodes (e.g. nodes.Call), not the deprecated top-level astroid aliases.

Running Tests

The test suite has 212 tests covering all three detector classes, utilities, and the CLI.

# Run the full test suite
python -m pytest tests/

# Run with verbose output
python -m pytest tests/ -v

# Run a specific test module
python -m pytest tests/test_pandas_smells.py
python -m pytest tests/test_pytorch_smells.py
python -m pytest tests/test_tensorflow_smells.py
python -m pytest tests/test_sklearn_smells.py
python -m pytest tests/test_numpy_smells.py
python -m pytest tests/test_huggingface_smells.py
python -m pytest tests/test_ml_detector.py
python -m pytest tests/test_utils.py
python -m pytest tests/test_cli.py

# Run a single test class or function
python -m pytest tests/test_sklearn_smells.py::TestCrossValidationChecker
python -m pytest tests/test_pytorch_smells.py::TestGradientClearChecker::test_detects_missing_zero_grad

# With coverage report
python -m pytest tests/ --cov=ml_code_smell_detector --cov-report=term-missing

Test Structure

File	Covers	Tests
`test_pandas_smells.py`	Pandas smells (Unnecessary Iteration, Chain Indexing, Merge Params, InPlace, etc.)	~20
`test_numpy_smells.py`	NumPy smells (NaN equality, randomness, axis, dtype, etc.)	~16
`test_sklearn_smells.py`	Sklearn smells (Scaler, Pipeline, CV, Randomness, Verbose, Threshold, etc.)	~20
`test_pytorch_smells.py`	PyTorch smells (Randomness, Determinism, Gradients, BatchNorm, Dropout, etc.)	~20
`test_tensorflow_smells.py`	TensorFlow smells (Randomness, EarlyStopping, Checkpointing, Memory, etc.)	~20
`test_huggingface_smells.py`	HuggingFace smells (versioning, caching, mixed precision, etc.)	~18
`test_ml_detector.py`	General ML smells (leakage, magic numbers, CV, reproducibility, etc.)	~22
`test_utils.py`	AST utility functions	~30
`test_cli.py`	CLI argument parsing, file collection, report writing	~10

Linting

The project uses Ruff as the primary linter (and import sorter) and keeps flake8 available as a secondary check. Both are configured for a 150-character line length (pyproject.toml [tool.ruff] and .flake8).

# Lint with Ruff
uv run ruff check .

# Auto-fix what Ruff can (import order, simple issues)
uv run ruff check . --fix

# Run flake8 as well
uv run python -m flake8 ml_code_smell_detector tests

Both linters must pass cleanly before committing. CI runs them on every push and pull request.

Documentation

The docs are built with Sphinx from reStructuredText sources in docs/source/ and are hosted on Read the Docs.

Where to edit

File	Contents
`docs/source/index.rst`	Landing page / table of contents
`docs/source/installation.rst`	Install instructions (keep the Python version in sync with `pyproject.toml`)
`docs/source/usage.rst`	CLI usage and options
`docs/source/features.rst`	Full list of detected smells
`docs/source/detectors/*.rst`	Auto-generated API docs for each detector
`docs/source/contributing.rst`	Contributor guide
`docs/source/changelog.rst`	Per-version changelog (update on every release)
`docs/source/conf.py`	Sphinx config — bump `release` on every version bump

Build locally

# Windows
rebuild_docs.bat

# Any platform
uv run sphinx-build -b html docs/source docs/build/html
# then open docs/build/html/index.html

docs/build/ is generated output and is git-ignored — never commit it.

Published docs

Read the Docs rebuilds automatically on every push to main using .readthedocs.yaml (Python 3.10) and docs/requirements.txt. When you change the public API or add a detector, update features.rst and the relevant detectors/*.rst so the published docs stay accurate.

Continuous Integration

GitHub Actions workflows live in .github/workflows/:

ci.yml — runs on every push and pull request to main:
- Lint: ruff check + flake8
- Test: full suite across Python 3.10, 3.11, 3.12, and 3.13
- Build: uv build + twine check to validate the distribution
publish.yml — publishes to PyPI when a GitHub Release is published (see below).

Workflow runs are visible under the repo's Actions tab. A red CI run blocks a release — fix it before tagging.

Releasing a New Version

Maintainer checklist for cutting a release (uses semantic versioning):

Make sure main is green — CI passing, uv run ruff check ., uv run python -m flake8 ml_code_smell_detector tests, and uv run python -m pytest tests/ all clean locally.
Bump the version in pyproject.toml (version = "X.Y.Z") and docs/source/conf.py (release = "X.Y.Z"). Keep them in sync.
Update the changelog — add an X.Y.Z entry at the top of docs/source/changelog.rst describing user-facing changes.

Commit and push to main:

git commit -am "release: X.Y.Z"
git push origin main

Tag and create a GitHub Release for vX.Y.Z. This triggers publish.yml, which builds, runs twine check, and publishes to PyPI via Trusted Publishing.
```
git tag vX.Y.Z
git push origin vX.Y.Z
# then publish the Release from the tag in the GitHub UI
```
Verify the new version appears at https://pypi.org/project/ml-code-smell-detector/ and installs cleanly:
```
uv pip install --no-cache ml-code-smell-detector==X.Y.Z
```

If a release is broken (e.g. fails to import on a supported Python version), yank it on PyPI (Manage → Release → Yank) and ship a fixed patch release. Yanking hides it from new installs without deleting it.

See Publishing to PyPI below for the underlying publish mechanics (Trusted Publishing and the manual token fallback).

Publishing to PyPI

Recommended: Trusted Publishing (OIDC, no token)

This repo ships a publish.yml workflow that uploads to PyPI using Trusted Publishing — no API token is stored or pasted anywhere.

One-time setup on the PyPI project page (Settings → Publishing → Add a trusted publisher):

Field	Value
Owner	`KarthikShivasankar`
Repository	`ml_smells_detector`
Workflow name	`publish.yml`
Environment	`pypi`

To release a new version:

Bump version in pyproject.toml and update docs/source/changelog.rst.
Commit and push to main.
Create a GitHub Release (e.g. tag v0.1.2). The publish.yml workflow builds, runs twine check, and publishes automatically.

Manual publish (fallback)

# Build sdist and wheel into dist/
uv build

# Validate, then publish (prompts for credentials)
uvx twine check dist/*
uv publish

# Or pass a project-scoped token directly
uv publish --token pypi-<your-token-here>

Prefer a project-scoped API token over an account-wide one, and never commit tokens to the repo. Trusted Publishing avoids tokens entirely.

Publish to TestPyPI first (optional)

uv publish --publish-url https://test.pypi.org/legacy/ --token pypi-<your-test-token>

# Verify the test install
uv pip install --index-url https://test.pypi.org/simple/ ml-code-smell-detector

Citation

If you use this tool in your research, please cite:

@inproceedings{shivashankar2025mlscent,
  title     = {MLScent: A tool for Anti-pattern detection in ML projects},
  author    = {Shivashankar, Karthik and Martini, Antonio},
  booktitle = {2025 IEEE/ACM 4th International Conference on AI Engineering--Software Engineering for AI (CAIN)},
  pages     = {150--160},
  year      = {2025},
  month     = {April},
  publisher = {IEEE}
}

Shivashankar, K., & Martini, A. (2025, April). MLScent: A tool for Anti-pattern detection in ML projects. In 2025 IEEE/ACM 4th International Conference on AI Engineering–Software Engineering for AI (CAIN) (pp. 150–160). IEEE.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

May 31, 2026

0.1.1

May 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_code_smell_detector-0.1.2.tar.gz (117.1 kB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ml_code_smell_detector-0.1.2-py3-none-any.whl (41.3 kB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file ml_code_smell_detector-0.1.2.tar.gz.

File metadata

Download URL: ml_code_smell_detector-0.1.2.tar.gz
Upload date: May 31, 2026
Size: 117.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ml_code_smell_detector-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`60fa6798eed379caa0baf6ee5079842efae83f1c0d0307fd46fb95ee23b50f07`
MD5	`901d8952d9243665624780d2ed036b36`
BLAKE2b-256	`c9af7d7da87161a9d6adf912cb942553ee1af12a5940a1ba67fefea8268199be`

See more details on using hashes here.

File details

Details for the file ml_code_smell_detector-0.1.2-py3-none-any.whl.

File metadata

Download URL: ml_code_smell_detector-0.1.2-py3-none-any.whl
Upload date: May 31, 2026
Size: 41.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for ml_code_smell_detector-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`870b625f0e9b7fc68efe7c3eb8b4a8069b113a38eca569d32beb0da85e36d96c`
MD5	`24516cca6a0d207278bc79dd1bd2f0e4`
BLAKE2b-256	`7080be554f053d55a25ffd34225fb4c14abea041a8eefef9137b17f93548d970`

See more details on using hashes here.

ml-code-smell-detector 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ML Code Smell Detector

Table of Contents

Installation

From PyPI

Development Install

Quick Start

Usage

Examples

Use Cases

1. Pre-commit / PR review check

2. Audit an existing ML project

3. Compare model training scripts

4. Integrate into CI/CD

5. Use as a Python library

Output

analysis_report.txt

analysis_report.csv

Detection Scope

Detected Smells

Framework-Specific Smells (FrameworkSpecificSmellDetector)

Hugging Face Smells (HuggingFaceSmellDetector)

General ML Smells (ML_SmellDetector)

Development & Maintenance

Prerequisites

Setup

Project layout

The development loop

Adding a new smell / detector

Coding conventions

Running Tests

Test Structure

Linting

Documentation

Where to edit

Build locally

Published docs

Continuous Integration

Releasing a New Version

Publishing to PyPI

Recommended: Trusted Publishing (OIDC, no token)

Manual publish (fallback)

Publish to TestPyPI first (optional)

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`analysis_report.txt`

`analysis_report.csv`

Framework-Specific Smells (`FrameworkSpecificSmellDetector`)

Hugging Face Smells (`HuggingFaceSmellDetector`)

General ML Smells (`ML_SmellDetector`)