ML Systems Reproducibility Auditor
Project description
ML Reproducibility Auditor
A systems-oriented CLI tool to evaluate machine learning repositories for reproducibility, engineering quality, and ML infrastructure signals.
Quick Demo
ml-audit https://github.com/pytorch/pytorch
Motivation
Most machine learning repositories:
- Cannot be reliably reproduced
- Lack dependency and environment clarity
- Provide no benchmark guarantees
- Hide system-level bottlenecks
This tool evaluates repositories through a systems lens, focusing on:
- Reproducibility signals
- Engineering maturity
- ML systems design patterns
Installation
pip install -e .
Usage
ml-audit https://github.com/user/repo
GitHub Action Usage
name: ML Reproducibility Audit
on:
pull_request:
workflow_dispatch:
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: OmprakashSahani/ml-repro-audit/.github/actions/ml-audit@v1
with:
repo-url: https://github.com/user/repo
Example Script
Run a quick audit using:
./examples/run_audit.sh
JSON Output
ml-audit https://github.com/user/repo --json
Example Output
Repository: user/repo
Structure Analysis:
- has_readme: YES
- has_license: YES
- has_ci: NO
- has_benchmarks: YES
Reproducibility Score: 7.5/10
Risk Level: MEDIUM
Code Quality Signals:
- has_pinned_dependencies: YES
- has_seed_control: NO
- has_training_loop: YES
ML Systems Detection:
- uses_pytorch: YES
- uses_distributed: YES
- uses_all_reduce: YES
Insights:
- No CI/CD detected → changes are not automatically validated
- Missing seed control → results may not be reproducible
JSON Output Example
{
"repository": "user/repo",
"score": 7.5,
"risk": "MEDIUM",
"analysis": {},
"quality": {},
"patterns": {},
"insights": []
}
Features
- GitHub API integration (with authentication support)
- Repository structure analysis (CI/CD, benchmarks, datasets)
- Code quality analysis (dependencies, determinism, training loops)
- Reproducibility scoring with weighted signals
- Risk classification (LOW / MEDIUM / HIGH)
- ML systems pattern detection (PyTorch, distributed training, all-reduce)
- Code-level inspection via GitHub API
- Insight generation based on system signals
- JSON output for automation and pipelines
- Rich CLI interface (tables, colors)
GitHub Integration
This tool integrates directly with the GitHub API to:
- Fetch repository metadata and file structure
- Inspect source code for ML system patterns
- Analyze engineering signals across repositories
It is designed as a developer tool to audit and improve repository quality within the GitHub ecosystem.
Use Cases
- Evaluate reproducibility of ML repositories before use
- Audit open-source projects for engineering quality
- Compare ML infrastructure practices across repositories
- Integrate into CI pipelines for repository validation
Architecture
flowchart TD
A[CLI Input] --> B[GitHub API]
B --> C[File Fetcher]
C --> D[Structure Analyzer]
C --> E[Code Quality Analyzer]
C --> F[ML Pattern Detector]
D --> G[Scoring Engine]
E --> G
G --> H[Risk Classifier]
D --> I[Insights Generator]
E --> I
F --> I
H --> J[Report Output]
I --> J
Design Principles
- Reproducibility-first — treat environment and determinism as first-class concerns
- Signal over noise — focus on high-impact engineering indicators
- System-aware analysis — go beyond files into behavior and patterns
- Composable design — CLI + JSON for integration into workflows
Evaluation Dimensions
The scoring system considers:
- Environment setup (dependencies, packaging)
- Determinism (seed control)
- Documentation
- Testing and validation
- CI/CD pipelines
- Benchmarking practices
- Dataset reproducibility
- Configuration-driven experimentation
Roadmap
- AST-based static analysis (deeper code understanding)
- Dataset pipeline validation
- Training loop structure detection
- Performance bottleneck hints
- Multi-repo comparison
- Web dashboard (FastAPI)
Limitations
- Heuristic-based detection (not full static analysis yet)
- Partial file sampling for performance
- GitHub API rate limits without authentication
- Static analysis only (does not execute code)
Why This Matters
Reproducibility is a major gap in real-world ML systems.
This project explores how:
- System design decisions affect reproducibility
- Engineering practices impact reliability
- Scalability constraints influence outcomes
Omprakash Sahani — ML Systems Engineer (Distributed Training · Optimization · Systems)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ml_repro_audit-0.1.0.tar.gz.
File metadata
- Download URL: ml_repro_audit-0.1.0.tar.gz
- Upload date:
- Size: 10.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
778f5471391c5643c3d5d7432cdf1757e68e86e2a4679452917f43906d9d4c28
|
|
| MD5 |
519575a309403eed51a2d839acb66e69
|
|
| BLAKE2b-256 |
bd0ec3845729aeee51e14592b68b64c1a5ac7ab1db38a00a467d13d64998465a
|
File details
Details for the file ml_repro_audit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ml_repro_audit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0517df0fc707c6ae513523156ada093c1cb2b6e6631323bcc542e797ee71d462
|
|
| MD5 |
92dd75294b48c3f4efddb2795ddaa2d6
|
|
| BLAKE2b-256 |
6579b4417c3e138fb15008b4255b232817a062bb37536be545623f8f6b94e5af
|