Skip to main content

Comprehensive codebase analysis library with coverage, metrics, and test duration analysis

Project description

Codebase Stats

CI/CD Coverage Python 3.12+ License: MIT

Production-ready code quality metrics analysis library for Python projects. Comprehensive analysis of coverage, complexity, maintainability, and code structure with automated reporting and quality gates.

Quick Start

Installation

# Using uv (recommended)
uv pip install codebase-stats

# Or with pip
pip install codebase-stats

Basic Usage

from codebase_stats import CodebaseStatsReporter

# Generate comprehensive metrics report
reporter = CodebaseStatsReporter(
    coverage_file='coverage.json',
    report_file='report.json',
    radon_root='src',
    fs_root='src',
    tree_root='src'
)

# Save report to file
reporter.save_report('metrics_report.txt', include_coverage=True, include_complexity=True)

Command Line Interface

# Full analysis with all metrics
python cli.py coverage.json --radon-root src --fs-root src

# Show specific sections
python cli.py coverage.json --show coverage complexity mi

# List low-coverage files
python cli.py coverage.json --show list --threshold 80 --top 20

# Help
python cli.py --help

Features

📊 Coverage Analysis

  • Statement & Branch Coverage: Full coverage metrics from pytest-cov
  • Coverage Distribution: Histogram visualization with percentiles (Q1, Q2, Q3, p90, p95, p99)
  • Low-Coverage Detection: Identify files below thresholds with automatic prioritization
  • Pragma Tracking: Track # pragma: no cover usage for documentation

🔍 Code Complexity

  • Cyclomatic Complexity (CC): Radon integration for function complexity analysis
    • Grade A: 1-5 (ideal)
    • Grade B: 6-10 (acceptable)
    • Grade C+: 11+ (refactor recommended)
  • Maintainability Index (MI): Code readability/maintainability scoring (0-100)
  • Halstead Metrics: Bug estimation and code volume analysis
  • Comment Ratios: Documentation density analysis

📈 Code Metrics

  • File Size Distribution: Line count per file with outlier detection
  • Directory Structure Analysis: Module organization and hierarchy
  • Test Duration Distribution: Identify slow tests
  • Quality Gates: Automated threshold validation

📋 Reporting

  • Histogram Visualization: ASCII histograms with customizable bins and scaling
  • Blame Sections: Highlight problematic files (Q3 + 1.5×IQR threshold)
  • Percentile Analysis: Q1, median, Q3, p90, p95, p99
  • Structured Output: Markdown, text, or programmatic JSON

Documentation

API Reference

Core Classes

CodebaseStatsReporter

Main interface for generating comprehensive metrics reports.

from codebase_stats import CodebaseStatsReporter

reporter = CodebaseStatsReporter(
    coverage_file: str,           # Path to coverage.json
    report_file: str = None,      # Path to pytest report.json (optional)
    radon_root: str = None,       # Root for CC/MI/comment analysis
    fs_root: str = None,          # Root for file size analysis
    tree_root: str = None         # Root for structure analysis
)

# Methods
reporter.save_report(filename, include_coverage=True, include_complexity=True)
reporter.get_stats() -> dict      # Get raw metrics dictionary
report_text = str(reporter)       # Generate text report

Data Structure

Coverage Stats Dictionary

stats = {
    "coverages_sorted": [float],                    # All file coverage %
    "proj_pct": float,                              # Project-wide coverage %
    "proj_total": int,                              # Total lines
    "proj_covered": int,                            # Covered lines
    "file_stats": [
        {
            "pct": float,                           # File coverage %
            "path": str,                            # File path
            "missing_count": int,                   # Missing line count
            "missing_lines": [int],                 # Missing line numbers
            "cc_avg": float,                        # Avg cyclomatic complexity
            "mi": float,                            # Maintainability index
            "comment_ratio": float,                 # Comment/SLOC ratio
            "hal_bugs": float,                      # Halstead bug estimate
            "size_lines": int,                      # File line count
            ...
        }
    ]
}

Module APIs

coverage.py - Coverage Analysis

from codebase_stats.coverage import load_coverage, precompute_coverage_stats

stats = load_coverage('coverage.json')
stats = precompute_coverage_stats(stats, radon_root='src')

metrics.py - Complexity Metrics

from codebase_stats.metrics import get_cyclomatic_complexity, get_maintainability, get_comments_ratio

cc = get_cyclomatic_complexity('file.py')
mi = get_maintainability('file.py')
ratio = get_comments_ratio('file.py')

radon.py - Radon Integration

from codebase_stats.radon import get_cc_list, get_mi_list, get_metrics

cc_data = get_cc_list('src')
mi_data = get_mi_list('src')
hal_data = get_metrics('src')

reporter.py - Report Generation

from codebase_stats.reporter import CodebaseStatsReporter

reporter = CodebaseStatsReporter(...)
reporter.save_report('output.txt')  # Save formatted report

utils.py - Utilities

from codebase_stats.utils import ascii_histogram, percentile, format_value

hist_str = ascii_histogram(data, bins=10, width=80)
p95 = percentile(data, 0.95)
formatted = format_value(value, decimals=2)

Quality Gates

All code must meet these thresholds:

Metric Threshold Rationale
Coverage 100% All source code must be tested
Cyclomatic Complexity ≤10 average Grade B maintainability
Maintainability Index ≥50 Grade A minimum
File Size ≤400 lines Modules remain manageable

Development Workflow

Setup

# Clone repository
git clone https://github.com/brunolnetto/codebase-stats.git
cd codebase-stats

# Install with dev dependencies
uv venv
uv pip install -e ".[dev]"

# Activate environment
source .venv/bin/activate

Testing

# Run all tests
pytest

# With coverage
pytest --cov=codebase_stats --cov-report=term-plus

# Specific test file
pytest tests/test_coverage.py -v

Code Quality

# Linting
ruff check codebase_stats/ tests/

# Format check
ruff format --check codebase_stats/ tests/

# Type checking
mypy codebase_stats/

# All quality checks
make quality

Commits & PRs

This project uses GitFlow workflow with conventional commits:

# Feature branches
git checkout -b feat/feature-name
# Fix branches  
git checkout -b fix/issue-name
# Chore/documentation
git checkout -b chore/update-name

# Commit format: <type>(<scope>): <description>
git commit -m "feat(coverage): add pragma tracking"
git commit -m "fix(radon): handle empty files gracefully"
git commit -m "docs(readme): add API reference"

See Development Policy for full workflow details.

Architecture Highlights

Data Flow:

Raw Input (coverage.json, report.json)
    ↓
load_coverage() → Enrich with Radon metrics
    ↓
precompute_coverage_stats() → Compute distributions
    ↓
Display Functions → Histograms, tables, blame sections
    ↓
Reporter → Formatted text/markdown output

Key Design Patterns:

  • Single Responsibility: Each module handles one analysis type
  • Composition: Reporter combines multiple analysis modules
  • Lazy Evaluation: Radon metrics computed on-demand
  • Immutable Data: Stats dicts treated as read-only
  • Histogram Abstraction: Consistent visualization across metrics

See Architecture Documentation for detailed system design.

Examples

Generate Full Metrics Report

# After running tests
pytest --cov=src --cov-report=json

# Generate and save report
python cli.py coverage.json \
  --radon-root src \
  --fs-root src \
  --tree-root src \
  --report metrics_report.txt

Analyze Coverage Gaps

python cli.py coverage.json --show coverage gaps

Monitor Complexity Trends

from codebase_stats import CodebaseStatsReporter

def check_complexity_trend():
    reporter = CodebaseStatsReporter('coverage.json', radon_root='src')
    stats = reporter.get_stats()
    
    cc_values = [f['cc_avg'] for f in stats['file_stats']]
    avg_cc = sum(cc_values) / len(cc_values)
    
    if avg_cc > 10:
        print(f"⚠️  Complexity increasing: {avg_cc:.2f}")
    else:
        print(f"✅ Complexity healthy: {avg_cc:.2f}")

Contributing

  1. See Development Roadmap for planned work
  2. Submit PRs against develop branch (GitFlow)
  3. All PRs require passing quality gates and 100% test coverage
  4. Follow Governance for workflow details

License

MIT License - see LICENSE file for details

Metrics Definitions

Metric Range Interpretation
Coverage 0-100% Percentage of code lines executed by tests
CC 1-50+ Function branching complexity; A≤5, B≤10, C≤15, D≤20, E/F>20
MI 0-100 Code readability/maintainability; A≥20, B≥10, C≥0
Comment Ratio 0-100% Percentage of code that is comments/docstrings
Halstead Bugs 0-N Estimated number of bugs; lower is better
File Size Lines Module size; target ≤400 for maintainability

Status: Production-ready · Latest Release: 1.0.0 · Python: 3.12+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codebase_stats-0.0.3.tar.gz (60.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codebase_stats-0.0.3-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file codebase_stats-0.0.3.tar.gz.

File metadata

  • Download URL: codebase_stats-0.0.3.tar.gz
  • Upload date:
  • Size: 60.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.2

File hashes

Hashes for codebase_stats-0.0.3.tar.gz
Algorithm Hash digest
SHA256 a9f392ce2b8547abbf339f857932942c37d99e9cc2eb4770c42bbf22bd3b2a7f
MD5 5559600ab67445f4cedc56d169c500d4
BLAKE2b-256 a5b037b1f45b2242b5a5329d490b6eb4d063799666293f9f9b4f0cabd6902aa9

See more details on using hashes here.

File details

Details for the file codebase_stats-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for codebase_stats-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 57724a05e19e945cc3aea354ea231e7de7b8e8c47f075860e6ee36da72eb9ed6
MD5 f2f475c55572f4125dd8962df732a63f
BLAKE2b-256 7d5d8a0cbf52a7e73df2e0ae0ea69186f7a187424e1e49450bbe6a2c544c489e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page