Skip to main content

Function-level git commit analysis tool

Project description

DiffScope

Function-level git commit analysis tool. DiffScope helps you analyze Git commits to identify which functions were modified, added, or deleted.

Features

  • Analyze GitHub commits at both file and function levels
  • Identify exactly which functions were changed in each commit
  • Detect function changes including signature, body, and docstring changes
  • Supports multiple programming languages using tree-sitter
  • Simple API for integration into other tools

Installation

# Clone the repository
git clone https://github.com/yourusername/DiffScope.git
cd DiffScope

# Install dependencies
pip install -r requirements.txt

Usage

Basic Usage

from diffscope import analyze_commit

# Analyze a GitHub commit
result = analyze_commit("https://github.com/owner/repo/commit/sha")

# Print file-level changes
print(f"Files changed: {len(result.modified_files)}")
for file in result.modified_files:
    print(f"- {file.filename}: +{file.additions} -{file.deletions}")

# Print function-level changes
print(f"Functions changed: {len(result.modified_functions)}")
for function in result.modified_functions:
    print(f"- {function.name} in {function.file}: {function.change_type}")

GitHub Authentication

To avoid rate limits, set a GitHub token in your environment:

# Linux/Mac
export GITHUB_TOKEN=your_token_here

# Windows PowerShell
$env:GITHUB_TOKEN="your_token_here"

# Windows CMD
set GITHUB_TOKEN=your_token_here

Running Tests

DiffScope includes a comprehensive test suite with both unit tests and integration tests.

Unit Tests

Run the unit tests (no GitHub API calls):

python -m pytest tests/unit

Integration Tests

Integration tests require the --run-live-api flag to enable tests that make real GitHub API calls:

# Run with a GitHub token to avoid rate limits
export GITHUB_TOKEN=your_token_here
python -m pytest tests/integration --run-live-api

You can also use the provided test helper:

# Run all tests including integration tests
python tests/run_tests.py --all --token=your_github_token_here

Testing with Verbose Output

To see detailed test output including function changes:

python -m pytest tests/integration/test_commit_analysis.py -v -s --run-live-api

Supported Languages

DiffScope currently supports function detection for:

  • Python
  • JavaScript
  • TypeScript
  • Java
  • C/C++
  • Go

Project Structure

src/
├── parsers/          # Function parsing using tree-sitter
├── core/             # Core analysis functionality
├── utils/            # Utility functions and tools
├── models.py         # Data models
└── __init__.py       # Main API

tests/
├── unit/             # Unit tests
├── integration/      # Integration tests
└── samples/          # Test data

Implementation Details

DiffScope implements a sophisticated approach to analyzing Git commits at the function level. This section provides a detailed overview of the implementation architecture and data flow.

Architecture Overview

DiffScope follows a modular architecture with clear separation of concerns:

  • Core Analysis Pipeline: Two-phase approach for efficient analysis

    • Phase 1: File-level analysis via GitHub API
    • Phase 2: Function-level analysis with tree-sitter parsing
  • Data Models: Three primary data structures

    • CommitAnalysisResult: Container for all analysis data
    • ModifiedFile: Represents file-level changes
    • ModifiedFunction: Represents function-level changes
  • Language Support: Tree-sitter integration for accurate parsing

    • Language-specific queries for function detection
    • Support for Python, JavaScript, TypeScript, Java, C/C++, Go

Data Flow

The analysis follows a clear pipeline:

  1. Input Processing

    • Parse GitHub URL to extract repository and commit information
    • Authenticate with GitHub API using provided token
  2. File-Level Analysis (git_analyzer.py)

    • Fetch commit metadata and file changes from GitHub API
    • Identify modified, added, deleted, and renamed files
    • Perform language detection based on file extensions
  3. Function-Level Analysis (commit_analyzer.py)

    • For each file, retrieve content before and after changes
    • Filter files based on language support and binary detection
    • Process files differently based on their status (added/modified/deleted)
  4. Function Detection (function_detector.py & function_parser.py)

    • Parse code using tree-sitter with language-specific queries
    • Extract function metadata (name, position, content)
    • Compare functions between file versions to detect changes
  5. Diff Analysis (diff_utils.py)

    • Parse unified diff format to extract change information
    • Map line numbers between original and new file versions
    • Extract function-specific diffs for detailed change analysis
  6. Change Classification

    • Identify function change types:
      • Added, deleted, renamed functions
      • Signature, body, and docstring changes
    • Detect renamed functions using similarity metrics
  7. Result Generation

    • Compile comprehensive CommitAnalysisResult with all analysis data
    • Include both file and function-level changes

Key Algorithms

  1. Function Change Detection:

    • Extract functions from both old and new versions
    • Match functions by name and location
    • Compare function content to classify changes
    • Use diff analysis to identify specific changes
  2. Renamed Function Detection:

    • Identify deleted and added functions across files
    • Compute similarity scores between function pairs
    • Match functions with high similarity scores
    • Apply heuristics to confirm renames vs. new implementations
  3. Diff Analysis and Line Mapping:

    • Parse GitHub patch format into structured hunks
    • Map line numbers between original and new files
    • Associate diff hunks with specific functions
    • Handle edge cases like overlapping functions

Error Handling and Robustness

DiffScope implements comprehensive error handling:

  • Graceful degradation when GitHub API rate limits are reached
  • Robust handling of malformed patches and unexpected code structures
  • Skip analysis for unsupported languages and binary files
  • Detailed logging for diagnosing issues

Performance Optimizations

  • Tree-sitter for efficient code parsing
  • Two-phase analysis to avoid unnecessary processing
  • Targeted function analysis based on diff information
  • Progressive refinement from file-level to function-level details

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Clone the repository
  2. Install development dependencies: pip install -r requirements-dev.txt
  3. Run the tests: python -m pytest

Adding Tests

When adding features, please add corresponding tests:

  • Unit tests for isolated functionality
  • Integration tests for end-to-end workflows

See the test documentation for more details.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffscope-0.2.0.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diffscope-0.2.0-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file diffscope-0.2.0.tar.gz.

File metadata

  • Download URL: diffscope-0.2.0.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for diffscope-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bd2a3d1606be1732c89537b60499380167e49467d2670aaf6cb6dce6ede47e4e
MD5 7e15a14e303e2416237dfe7d1e7cfbe4
BLAKE2b-256 d181bc83e2b4a28985326247f7e1dadd93dd3a32a05ea793177ab6434d299c6d

See more details on using hashes here.

File details

Details for the file diffscope-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: diffscope-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 33.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for diffscope-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8991adee9aaa9b252658d52e3301399f23e8fa73e29ce55ec0812417edd09005
MD5 e04e7ff3f6aad81e8918f04997c21063
BLAKE2b-256 c08c3531b43c9749d2d7bc5c300001e1be4c271ff32034108bd382fb4395e9bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page