Skip to main content

Function-level git commit analysis tool

Project description

DiffScope

Function-level git commit analysis tool. DiffScope helps you analyze Git commits to identify which functions were modified, added, or deleted.

Features

  • Analyze GitHub commits at both file and function levels
  • Identify exactly which functions were changed in each commit
  • Detect function changes including signature, body, and docstring changes
  • Supports multiple programming languages using tree-sitter
  • Simple API for integration into other tools

Installation

# Clone the repository
git clone https://github.com/yourusername/DiffScope.git
cd DiffScope

# Install dependencies
pip install -r requirements.txt

Usage

Basic Usage

from diffscope import analyze_commit

# Analyze a GitHub commit
result = analyze_commit("https://github.com/owner/repo/commit/sha")

# Print file-level changes
print(f"Files changed: {len(result.modified_files)}")
for file in result.modified_files:
    print(f"- {file.filename}: +{file.additions} -{file.deletions}")

# Print function-level changes
print(f"Functions changed: {len(result.modified_functions)}")
for function in result.modified_functions:
    print(f"- {function.name} in {function.file}: {function.change_type}")

GitHub Authentication

To avoid rate limits, set a GitHub token in your environment:

# Linux/Mac
export GITHUB_TOKEN=your_token_here

# Windows PowerShell
$env:GITHUB_TOKEN="your_token_here"

# Windows CMD
set GITHUB_TOKEN=your_token_here

Running Tests

DiffScope includes a comprehensive test suite with both unit tests and integration tests.

Unit Tests

Run the unit tests (no GitHub API calls):

python -m pytest tests/unit

Integration Tests

Integration tests require the --run-live-api flag to enable tests that make real GitHub API calls:

# Run with a GitHub token to avoid rate limits
export GITHUB_TOKEN=your_token_here
python -m pytest tests/integration --run-live-api

You can also use the provided test helper:

# Run all tests including integration tests
python tests/run_tests.py --all --token=your_github_token_here

Testing with Verbose Output

To see detailed test output including function changes:

python -m pytest tests/integration/test_commit_analysis.py -v -s --run-live-api

Supported Languages

DiffScope currently supports function detection for:

  • Python
  • JavaScript
  • TypeScript
  • Java
  • C/C++
  • Go

Project Structure

src/
├── parsers/          # Function parsing using tree-sitter
├── core/             # Core analysis functionality
├── utils/            # Utility functions and tools
├── models.py         # Data models
└── __init__.py       # Main API

tests/
├── unit/             # Unit tests
├── integration/      # Integration tests
└── samples/          # Test data

Implementation Details

DiffScope implements a sophisticated approach to analyzing Git commits at the function level. This section provides a detailed overview of the implementation architecture and data flow.

Architecture Overview

DiffScope follows a modular architecture with clear separation of concerns:

  • Core Analysis Pipeline: Two-phase approach for efficient analysis

    • Phase 1: File-level analysis via GitHub API
    • Phase 2: Function-level analysis with tree-sitter parsing
  • Data Models: Three primary data structures

    • CommitAnalysisResult: Container for all analysis data
    • ModifiedFile: Represents file-level changes
    • ModifiedFunction: Represents function-level changes
  • Language Support: Tree-sitter integration for accurate parsing

    • Language-specific queries for function detection
    • Support for Python, JavaScript, TypeScript, Java, C/C++, Go

Data Flow

The analysis follows a clear pipeline:

  1. Input Processing

    • Parse GitHub URL to extract repository and commit information
    • Authenticate with GitHub API using provided token
  2. File-Level Analysis (git_analyzer.py)

    • Fetch commit metadata and file changes from GitHub API
    • Identify modified, added, deleted, and renamed files
    • Perform language detection based on file extensions
  3. Function-Level Analysis (commit_analyzer.py)

    • For each file, retrieve content before and after changes
    • Filter files based on language support and binary detection
    • Process files differently based on their status (added/modified/deleted)
  4. Function Detection (function_detector.py & function_parser.py)

    • Parse code using tree-sitter with language-specific queries
    • Extract function metadata (name, position, content)
    • Compare functions between file versions to detect changes
  5. Diff Analysis (diff_utils.py)

    • Parse unified diff format to extract change information
    • Map line numbers between original and new file versions
    • Extract function-specific diffs for detailed change analysis
  6. Change Classification

    • Identify function change types:
      • Added, deleted, renamed functions
      • Signature, body, and docstring changes
    • Detect renamed functions using similarity metrics
  7. Result Generation

    • Compile comprehensive CommitAnalysisResult with all analysis data
    • Include both file and function-level changes

Key Algorithms

  1. Function Change Detection:

    • Extract functions from both old and new versions
    • Match functions by name and location
    • Compare function content to classify changes
    • Use diff analysis to identify specific changes
  2. Renamed Function Detection:

    • Identify deleted and added functions across files
    • Compute similarity scores between function pairs
    • Match functions with high similarity scores
    • Apply heuristics to confirm renames vs. new implementations
  3. Diff Analysis and Line Mapping:

    • Parse GitHub patch format into structured hunks
    • Map line numbers between original and new files
    • Associate diff hunks with specific functions
    • Handle edge cases like overlapping functions

Error Handling and Robustness

DiffScope implements comprehensive error handling:

  • Graceful degradation when GitHub API rate limits are reached
  • Robust handling of malformed patches and unexpected code structures
  • Skip analysis for unsupported languages and binary files
  • Detailed logging for diagnosing issues

Performance Optimizations

  • Tree-sitter for efficient code parsing
  • Two-phase analysis to avoid unnecessary processing
  • Targeted function analysis based on diff information
  • Progressive refinement from file-level to function-level details

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Clone the repository
  2. Install development dependencies: pip install -r requirements-dev.txt
  3. Run the tests: python -m pytest

Adding Tests

When adding features, please add corresponding tests:

  • Unit tests for isolated functionality
  • Integration tests for end-to-end workflows

See the test documentation for more details.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffscope-0.1.1.tar.gz (52.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diffscope-0.1.1-py3-none-any.whl (33.9 kB view details)

Uploaded Python 3

File details

Details for the file diffscope-0.1.1.tar.gz.

File metadata

  • Download URL: diffscope-0.1.1.tar.gz
  • Upload date:
  • Size: 52.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for diffscope-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0565e2ca86ad9dd95d333eec8572f90bf11d936543767cbaaa31abcb259d77b7
MD5 745b87796f91a6d49b1319132d4be639
BLAKE2b-256 b8f99e01190f96436d4b1528124521bd257087beedc18b71e59b5141269a2941

See more details on using hashes here.

File details

Details for the file diffscope-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: diffscope-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for diffscope-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 27d35f59ecffa7674093a852f87ba447338cbd9df24fee1b183bc38a9a8574d2
MD5 baf0698bde2fc3c0d12af7584d881f73
BLAKE2b-256 606f52f813c9dc0c924b8701221d32e61bfa470bcd297935d9e66895907157b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page