Skip to main content

AI-powered maintenance risk predictor for git repositories using XGBoost

Project description

๐Ÿ” MaintSight

PyPI version License: Apache 2.0 Python

AI-powered maintenance degradation predictor for git repositories using XGBoost machine learning

MaintSight analyzes your git repository's commit history and code patterns to predict maintenance degradation at the file level. Using a trained XGBoost model, it identifies code quality trends and helps prioritize refactoring efforts by detecting files that are degrading over time.

๐Ÿ“‹ Table of Contents

โœจ Features

  • ๐Ÿค– XGBoost ML Predictions: Pre-trained model for maintenance degradation scoring
  • ๐Ÿ“Š Git History Analysis: Analyzes commits, changes, and collaboration patterns
  • ๐Ÿ“ˆ Multiple Output Formats: JSON, CSV, Markdown, or interactive HTML reports
  • ๐ŸŽฏ Degradation Categorization: Four-level classification (Improved/Stable/Degraded/Severely Degraded)
  • ๐Ÿ” Threshold Filtering: Focus on degraded files only
  • ๐ŸŒ Interactive HTML Reports: Rich, interactive analysis with visualizations
  • โšก Fast & Efficient: Analyzes hundreds of files in seconds
  • ๐Ÿ› ๏ธ Easy Integration: Simple CLI interface and npm package

๐Ÿš€ Quick Start

# Install from PyPI
pip install maintsight

# Run predictions on current directory (generates interactive HTML report)
python3 maintsight_complete.py

# Show only degraded files with threshold
python3 maintsight_complete.py -f summary

# Generate JSON output
python3 maintsight_complete.py -f json

# Analyze specific repository
python3 maintsight_complete.py /path/to/repo

๐Ÿ“ฆ Installation

From PyPI (Coming Soon)

pip install maintsight

From Source (Current)

git clone https://github.com/techdebtgpt/maintsight.git
cd maintsight-pip
pip install -r requirements.txt

# Run the complete version
python3 maintsight_complete.py

Development Installation

pip install -e ".[dev]"

๐Ÿ“– Usage

Basic Prediction

# Analyze current directory (generates HTML report)
python3 maintsight_complete.py

# Analyze specific repository
python3 maintsight_complete.py /path/to/repo

# Generate summary output
python3 maintsight_complete.py -f summary

Advanced Options

# Analyze specific branch
python3 maintsight_complete.py -b develop

# Limit commit analysis window
python3 maintsight_complete.py -w 90  # Analyze last 90 days

# Limit number of commits
python3 maintsight_complete.py -n 5000

# Generate JSON output
python3 maintsight_complete.py -f json

# All options together
python3 maintsight_complete.py /path/to/repo -b main -w 150 -n 1000 -f html

Python API Usage

from maintsight import GitCommitCollector, MockPredictor
from maintsight.utils.html_generator import generate_html_report

# Collect git data
collector = GitCommitCollector(repo_path="./", branch="main")
commit_data = collector.fetch_commit_data()

# Generate predictions
predictor = MockPredictor()
predictions = predictor.predict(commit_data)

# Generate HTML report
html_path = generate_html_report(predictions, commit_data, "./")

๐Ÿ“Š Output Formats

JSON (Default)

[
  {
    "module": "src/legacy/parser.ts",
    "degradation_score": 0.3456,
    "raw_prediction": 0.3456,
    "risk_category": "severely_degraded"
  },
  {
    "module": "src/utils/helpers.ts",
    "degradation_score": -0.1234,
    "raw_prediction": -0.1234,
    "risk_category": "improved"
  }
]

CSV

module,degradation_score,raw_prediction,risk_category
"src/legacy/parser.ts","0.3456","0.3456","severely_degraded"
"src/utils/helpers.ts","-0.1234","-0.1234","improved"

Markdown Report

Generates a comprehensive report with:

  • Degradation distribution summary
  • Top 20 most degraded files
  • Category breakdown with percentages
  • Actionable recommendations

Interactive HTML Report

Always generated automatically in .maintsight/ folder with:

  • Visual degradation trends
  • Interactive file explorer
  • Detailed metrics per file
  • Commit history analysis

๐ŸŽฏ Degradation Categories

Score Range Category Description Action
< 0.0 ๐ŸŸข Improved Code quality improving over time Continue good practices
0.0-0.1 ๐Ÿ”ต Stable Code quality stable Regular maintenance
0.1-0.2 ๐ŸŸก Degraded Code quality declining Schedule for refactoring
> 0.2 ๐Ÿ”ด Severely Degraded Rapid quality decline Immediate attention needed

๐Ÿ“š Command Reference

maintsight_complete.py

Analyze repository and predict maintenance degradation.

python3 maintsight_complete.py [path] [options]

Arguments:

  • path - Repository path (default: current directory)

Options:

  • -b, --branch BRANCH - Git branch to analyze (default: "main")
  • -n, --max-commits N - Maximum commits to analyze (default: 1000)
  • -w, --window-days N - Time window in days for analysis (default: 150)
  • -f, --format FORMAT - Output format: json|summary|html (default: "html")
  • -h, --help - Show help information

Examples

# Generate HTML report with default settings
python3 maintsight_complete.py

# Analyze last 90 days on develop branch
python3 maintsight_complete.py -b develop -w 90

# Get JSON output for processing
python3 maintsight_complete.py -f json > results.json

# Show summary for quick overview
python3 maintsight_complete.py -f summary

๐Ÿง  Model Information

MaintSight uses an XGBoost model trained on software maintenance degradation patterns. The model predicts how code quality changes over time by analyzing git commit patterns and code evolution metrics.

Key Features Analyzed

The model considers multiple dimensions of code evolution:

  • Commit patterns: Frequency, size, and timing of changes
  • Author collaboration: Number of contributors and collaboration patterns
  • Code churn: Lines added, removed, and modified over time
  • Change consistency: Regularity and predictability of modifications
  • Bug indicators: Patterns suggesting defects or fixes
  • Temporal factors: File age and time since last modification

Prediction Output

  • degradation_score: Numerical score indicating code quality trend
    • Negative values: Quality improving
    • Positive values: Quality degrading
    • Higher magnitude = stronger trend
  • risk_category: Classification based on degradation severity
  • raw_prediction: Unprocessed model output

๐Ÿ”ง Development

Prerequisites

  • Python >= 3.8
  • Git

Setup

# Clone repository
git clone https://github.com/techdebtgpt/maintsight.git
cd maintsight-pip

# Install in development mode
pip install -e ".[dev]"

# Or install requirements directly
pip install -r requirements.txt

# Run the main script
python3 maintsight_complete.py

Project Structure

maintsight-pip/
โ”œโ”€โ”€ maintsight/                    # Python package
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ cli.py                     # Click-based CLI
โ”‚   โ”œโ”€โ”€ models/                    # Data models
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ commit_data.py         # CommitData dataclass
โ”‚   โ”‚   โ”œโ”€โ”€ risk_category.py       # RiskCategory enum
โ”‚   โ”‚   โ”œโ”€โ”€ risk_prediction.py     # RiskPrediction dataclass
โ”‚   โ”‚   โ”œโ”€โ”€ file_stats.py          # FileStats dataclass
โ”‚   โ”‚   โ”œโ”€โ”€ xgboost_model.py       # XGBoost model structures
โ”‚   โ”‚   โ”œโ”€โ”€ xgboost_degradation_model_multiwindow_v2.pkl      # Pre-trained model
โ”‚   โ”‚   โ””โ”€โ”€ xgboost_degradation_model_multiwindow_v2_metadata.json  # Model metadata
โ”‚   โ”œโ”€โ”€ services/                  # Core services
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ git_commit_collector.py
โ”‚   โ”‚   โ”œโ”€โ”€ feature_engineer.py
โ”‚   โ”‚   โ””โ”€โ”€ xgboost_predictor.py
โ”‚   โ””โ”€โ”€ utils/                     # Utilities
โ”‚       โ”œโ”€โ”€ __init__.py
โ”‚       โ”œโ”€โ”€ logger.py              # Rich-based logger
โ”‚       โ””โ”€โ”€ html_generator.py      # HTML report generator
โ”œโ”€โ”€ tests/                         # pytest tests
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ test_risk_category.py
โ”œโ”€โ”€ maintsight_complete.py         # Standalone complete script
โ”œโ”€โ”€ pyproject.toml                 # Modern Python packaging
โ”œโ”€โ”€ setup.py                       # Legacy setuptools support
โ”œโ”€โ”€ requirements.txt               # Runtime dependencies
โ””โ”€โ”€ requirements-dev.txt           # Development dependencies

๐Ÿงช Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=maintsight

# Run specific test file
pytest tests/test_risk_category.py

# Run with verbose output
pytest -v

# Install test dependencies
pip install -e ".[dev]"

Test Coverage Goals

  • Services: 80%+
  • Utils: 90%+
  • CLI: 70%+

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Quick Start

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Write tests for your changes
  4. Ensure all tests pass (npm test)
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Code Style

  • Use Python 3.8+ features
  • Follow PEP 8 style guide
  • Use black for code formatting
  • Use type hints where appropriate
  • Write meaningful commit messages
  • Add tests for new features
  • Update documentation as needed
# Format code
black maintsight/

# Sort imports
isort maintsight/

# Lint code
flake8 maintsight/

# Type checking
mypy maintsight/

๐Ÿ› Bug Reports

Found a bug? Please open an issue with:

  • MaintSight version (python3 maintsight_complete.py --help)
  • Python version
  • Operating system
  • Steps to reproduce
  • Expected vs actual behavior
  • Error messages/stack traces

๐Ÿ“„ License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • XGBoost community for the excellent gradient boosting framework
  • Git community for robust version control
  • All contributors who help improve MaintSight

Made with โค๏ธ by the TechDebtGPT Team

Repository | Documentation | Issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

maintsight_pip-0.3.0.tar.gz (738.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

maintsight_pip-0.3.0-py3-none-any.whl (754.3 kB view details)

Uploaded Python 3

File details

Details for the file maintsight_pip-0.3.0.tar.gz.

File metadata

  • Download URL: maintsight_pip-0.3.0.tar.gz
  • Upload date:
  • Size: 738.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for maintsight_pip-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fda6347f6978dcde71a11d89f657435106574b3e99c96966b544446b255020fa
MD5 b696bcba841a223106a39dba3dbcd3d5
BLAKE2b-256 15919dec8ad249942ea8b8ee4876d42de5d95b3c5170083d6ff6f137b68a7aed

See more details on using hashes here.

Provenance

The following attestation bundles were made for maintsight_pip-0.3.0.tar.gz:

Publisher: release.yml on floristafa/maintsight-pip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file maintsight_pip-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: maintsight_pip-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 754.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for maintsight_pip-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88d9bec997a37a3a98defed6e4f35ff8e4199912d650b85d54c8551b4735a670
MD5 eed38e6bb6a167f82b7d1fc9577648d6
BLAKE2b-256 a56aed08fc632d4772c3d15161d083b45d82c7a02c7063a0e7321766fc921c49

See more details on using hashes here.

Provenance

The following attestation bundles were made for maintsight_pip-0.3.0-py3-none-any.whl:

Publisher: release.yml on floristafa/maintsight-pip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page