Skip to main content

ML Systems Reproducibility Auditor

Project description

ML Reproducibility Auditor

A systems-oriented CLI tool to evaluate machine learning repositories for reproducibility, engineering quality, and ML infrastructure signals.

Python Interface Status


Quick Demo

ml-audit https://github.com/pytorch/pytorch

Motivation

Most machine learning repositories:

  • Cannot be reliably reproduced
  • Lack dependency and environment clarity
  • Provide no benchmark guarantees
  • Hide system-level bottlenecks

This tool evaluates repositories through a systems lens, focusing on:

  • Reproducibility signals
  • Engineering maturity
  • ML systems design patterns

Installation

pip install -e .

Usage

ml-audit https://github.com/user/repo

GitHub Action Usage

name: ML Reproducibility Audit
on:
  pull_request:
  workflow_dispatch:
jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: OmprakashSahani/ml-repro-audit/.github/actions/ml-audit@v1
        with:
          repo-url: https://github.com/user/repo

Example Script

Run a quick audit using:

./examples/run_audit.sh

JSON Output

ml-audit https://github.com/user/repo --json

Example Output

Repository: user/repo

Structure Analysis:
- has_readme: YES
- has_license: YES
- has_ci: NO
- has_benchmarks: YES

Reproducibility Score: 7.5/10
Risk Level: MEDIUM

Code Quality Signals:
- has_pinned_dependencies: YES
- has_seed_control: NO
- has_training_loop: YES

ML Systems Detection:
- uses_pytorch: YES
- uses_distributed: YES
- uses_all_reduce: YES

Insights:
- No CI/CD detected → changes are not automatically validated
- Missing seed control → results may not be reproducible

JSON Output Example

{
  "repository": "user/repo",
  "score": 7.5,
  "risk": "MEDIUM",
  "analysis": {},
  "quality": {},
  "patterns": {},
  "insights": []
}

Features

  • GitHub API integration (with authentication support)
  • Repository structure analysis (CI/CD, benchmarks, datasets)
  • Code quality analysis (dependencies, determinism, training loops)
  • Reproducibility scoring with weighted signals
  • Risk classification (LOW / MEDIUM / HIGH)
  • ML systems pattern detection (PyTorch, distributed training, all-reduce)
  • Code-level inspection via GitHub API
  • Insight generation based on system signals
  • JSON output for automation and pipelines
  • Rich CLI interface (tables, colors)

GitHub Integration

This tool integrates directly with the GitHub API to:

  • Fetch repository metadata and file structure
  • Inspect source code for ML system patterns
  • Analyze engineering signals across repositories

It is designed as a developer tool to audit and improve repository quality within the GitHub ecosystem.


Use Cases

  • Evaluate reproducibility of ML repositories before use
  • Audit open-source projects for engineering quality
  • Compare ML infrastructure practices across repositories
  • Integrate into CI pipelines for repository validation

Architecture

flowchart TD
    A[CLI Input] --> B[GitHub API]
    B --> C[File Fetcher]
    C --> D[Structure Analyzer]
    C --> E[Code Quality Analyzer]
    C --> F[ML Pattern Detector]
    D --> G[Scoring Engine]
    E --> G
    G --> H[Risk Classifier]
    D --> I[Insights Generator]
    E --> I
    F --> I
    H --> J[Report Output]
    I --> J

Design Principles

  • Reproducibility-first — treat environment and determinism as first-class concerns
  • Signal over noise — focus on high-impact engineering indicators
  • System-aware analysis — go beyond files into behavior and patterns
  • Composable design — CLI + JSON for integration into workflows

Evaluation Dimensions

The scoring system considers:

  • Environment setup (dependencies, packaging)
  • Determinism (seed control)
  • Documentation
  • Testing and validation
  • CI/CD pipelines
  • Benchmarking practices
  • Dataset reproducibility
  • Configuration-driven experimentation

Roadmap

  • AST-based static analysis (deeper code understanding)
  • Dataset pipeline validation
  • Training loop structure detection
  • Performance bottleneck hints
  • Multi-repo comparison
  • Web dashboard (FastAPI)

Limitations

  • Heuristic-based detection (not full static analysis yet)
  • Partial file sampling for performance
  • GitHub API rate limits without authentication
  • Static analysis only (does not execute code)

Why This Matters

Reproducibility is a major gap in real-world ML systems.

This project explores how:

  • System design decisions affect reproducibility
  • Engineering practices impact reliability
  • Scalability constraints influence outcomes

Omprakash Sahani — ML Systems Engineer (Distributed Training · Optimization · Systems)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_repro_audit-0.1.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ml_repro_audit-0.1.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file ml_repro_audit-0.1.0.tar.gz.

File metadata

  • Download URL: ml_repro_audit-0.1.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for ml_repro_audit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 778f5471391c5643c3d5d7432cdf1757e68e86e2a4679452917f43906d9d4c28
MD5 519575a309403eed51a2d839acb66e69
BLAKE2b-256 bd0ec3845729aeee51e14592b68b64c1a5ac7ab1db38a00a467d13d64998465a

See more details on using hashes here.

File details

Details for the file ml_repro_audit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ml_repro_audit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for ml_repro_audit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0517df0fc707c6ae513523156ada093c1cb2b6e6631323bcc542e797ee71d462
MD5 92dd75294b48c3f4efddb2795ddaa2d6
BLAKE2b-256 6579b4417c3e138fb15008b4255b232817a062bb37536be545623f8f6b94e5af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page