Function-level git commit analysis tool
Project description
DiffScope
Function-level git commit analysis tool. DiffScope helps you analyze Git commits to identify which functions were modified, added, or deleted.
Features
- Analyze GitHub commits at both file and function levels
- Identify exactly which functions were changed in each commit
- Detect function changes including signature, body, and docstring changes
- Supports multiple programming languages using tree-sitter
- Simple API for integration into other tools
Installation
# Clone the repository
git clone https://github.com/yourusername/DiffScope.git
cd DiffScope
# Install dependencies
pip install -r requirements.txt
Usage
Basic Usage
from diffscope import analyze_commit
# Analyze a GitHub commit
result = analyze_commit("https://github.com/owner/repo/commit/sha")
# Print file-level changes
print(f"Files changed: {len(result.modified_files)}")
for file in result.modified_files:
print(f"- {file.filename}: +{file.additions} -{file.deletions}")
# Print function-level changes
print(f"Functions changed: {len(result.modified_functions)}")
for function in result.modified_functions:
print(f"- {function.name} in {function.file}: {function.change_type}")
GitHub Authentication
To avoid rate limits, set a GitHub token in your environment:
# Linux/Mac
export GITHUB_TOKEN=your_token_here
# Windows PowerShell
$env:GITHUB_TOKEN="your_token_here"
# Windows CMD
set GITHUB_TOKEN=your_token_here
Running Tests
DiffScope includes a comprehensive test suite with both unit tests and integration tests.
Unit Tests
Run the unit tests (no GitHub API calls):
python -m pytest tests/unit
Integration Tests
Integration tests require the --run-live-api flag to enable tests that make real GitHub API calls:
# Run with a GitHub token to avoid rate limits
export GITHUB_TOKEN=your_token_here
python -m pytest tests/integration --run-live-api
You can also use the provided test helper:
# Run all tests including integration tests
python tests/run_tests.py --all --token=your_github_token_here
Testing with Verbose Output
To see detailed test output including function changes:
python -m pytest tests/integration/test_commit_analysis.py -v -s --run-live-api
Supported Languages
DiffScope currently supports function detection for:
- Python
- JavaScript
- TypeScript
- Java
- C/C++
- Go
Project Structure
src/
├── parsers/ # Function parsing using tree-sitter
├── core/ # Core analysis functionality
├── utils/ # Utility functions and tools
├── models.py # Data models
└── __init__.py # Main API
tests/
├── unit/ # Unit tests
├── integration/ # Integration tests
└── samples/ # Test data
Implementation Details
DiffScope implements a sophisticated approach to analyzing Git commits at the function level. This section provides a detailed overview of the implementation architecture and data flow.
Architecture Overview
DiffScope follows a modular architecture with clear separation of concerns:
-
Core Analysis Pipeline: Two-phase approach for efficient analysis
- Phase 1: File-level analysis via GitHub API
- Phase 2: Function-level analysis with tree-sitter parsing
-
Data Models: Three primary data structures
CommitAnalysisResult: Container for all analysis dataModifiedFile: Represents file-level changesModifiedFunction: Represents function-level changes
-
Language Support: Tree-sitter integration for accurate parsing
- Language-specific queries for function detection
- Support for Python, JavaScript, TypeScript, Java, C/C++, Go
Data Flow
The analysis follows a clear pipeline:
-
Input Processing
- Parse GitHub URL to extract repository and commit information
- Authenticate with GitHub API using provided token
-
File-Level Analysis (
git_analyzer.py)- Fetch commit metadata and file changes from GitHub API
- Identify modified, added, deleted, and renamed files
- Perform language detection based on file extensions
-
Function-Level Analysis (
commit_analyzer.py)- For each file, retrieve content before and after changes
- Filter files based on language support and binary detection
- Process files differently based on their status (added/modified/deleted)
-
Function Detection (
function_detector.py&function_parser.py)- Parse code using tree-sitter with language-specific queries
- Extract function metadata (name, position, content)
- Compare functions between file versions to detect changes
-
Diff Analysis (
diff_utils.py)- Parse unified diff format to extract change information
- Map line numbers between original and new file versions
- Extract function-specific diffs for detailed change analysis
-
Change Classification
- Identify function change types:
- Added, deleted, renamed functions
- Signature, body, and docstring changes
- Detect renamed functions using similarity metrics
- Identify function change types:
-
Result Generation
- Compile comprehensive
CommitAnalysisResultwith all analysis data - Include both file and function-level changes
- Compile comprehensive
Key Algorithms
-
Function Change Detection:
- Extract functions from both old and new versions
- Match functions by name and location
- Compare function content to classify changes
- Use diff analysis to identify specific changes
-
Renamed Function Detection:
- Identify deleted and added functions across files
- Compute similarity scores between function pairs
- Match functions with high similarity scores
- Apply heuristics to confirm renames vs. new implementations
-
Diff Analysis and Line Mapping:
- Parse GitHub patch format into structured hunks
- Map line numbers between original and new files
- Associate diff hunks with specific functions
- Handle edge cases like overlapping functions
Error Handling and Robustness
DiffScope implements comprehensive error handling:
- Graceful degradation when GitHub API rate limits are reached
- Robust handling of malformed patches and unexpected code structures
- Skip analysis for unsupported languages and binary files
- Detailed logging for diagnosing issues
Performance Optimizations
- Tree-sitter for efficient code parsing
- Two-phase analysis to avoid unnecessary processing
- Targeted function analysis based on diff information
- Progressive refinement from file-level to function-level details
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
- Clone the repository
- Install development dependencies:
pip install -r requirements-dev.txt - Run the tests:
python -m pytest
Adding Tests
When adding features, please add corresponding tests:
- Unit tests for isolated functionality
- Integration tests for end-to-end workflows
See the test documentation for more details.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file diffscope-0.1.1.tar.gz.
File metadata
- Download URL: diffscope-0.1.1.tar.gz
- Upload date:
- Size: 52.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0565e2ca86ad9dd95d333eec8572f90bf11d936543767cbaaa31abcb259d77b7
|
|
| MD5 |
745b87796f91a6d49b1319132d4be639
|
|
| BLAKE2b-256 |
b8f99e01190f96436d4b1528124521bd257087beedc18b71e59b5141269a2941
|
File details
Details for the file diffscope-0.1.1-py3-none-any.whl.
File metadata
- Download URL: diffscope-0.1.1-py3-none-any.whl
- Upload date:
- Size: 33.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27d35f59ecffa7674093a852f87ba447338cbd9df24fee1b183bc38a9a8574d2
|
|
| MD5 |
baf0698bde2fc3c0d12af7584d881f73
|
|
| BLAKE2b-256 |
606f52f813c9dc0c924b8701221d32e61bfa470bcd297935d9e66895907157b0
|