Skip to main content

Extract affected code structures from git diff output for Java and Kotlin files

Project description

diff-code-change-range

Extract affected code structures from git diff output for Java, Kotlin, and Python files.

Overview

This tool parses git diff --full-index -U999999 output, extracts complete before/after source code, and identifies which code structures (classes, methods, functions, members) are affected by the changes. It outputs structured YAML showing affected code structures for both before and after versions.

Additionally, the tool can extract reference relationships between affected nodes (method calls, field accesses, type references, etc.) to support impact analysis and precise code review.

Installation

From PyPI

pip install diff-code-change-range

From Source

git clone <repository>
cd diff-code-change-range
pip install -e .

Requirements

  • Python 3.9 or higher
  • Dependencies:
    • unidiff>=0.7.0 - Parse unified diff format
    • tree-sitter>=0.20.0 - Parse source code
    • tree-sitter-java>=0.20.0 - Java grammar for tree-sitter
    • tree-sitter-kotlin>=0.3.0 - Kotlin grammar for tree-sitter
    • tree-sitter-python>=0.21.0 - Python grammar for tree-sitter
    • pyyaml>=6.0 - YAML output

Usage

Basic Usage

Read diff from stdin:

git diff --full-index -U999999 | diff-code-change-range

Read diff from file:

diff-code-change-range path/to/diff.patch

CLI Arguments

diff-code-change-range [-h] [-v] [diff_file]

Positional Arguments:
  diff_file         Path to diff file (default: read from stdin)

Optional Arguments:
  -h, --help        Show help message and exit
  -v, --version     Show version and exit

Example Usage

# Generate diff with full context and pipe to tool
git diff --full-index -U999999 HEAD~1 | diff-code-change-range

# Save output to file
git diff --full-index -U999999 > changes.patch
diff-code-change-range changes.patch > analysis.yaml

# Use as Python module
python -m diff_code_change_range < changes.patch

Python Example

Input Python file:

class Calculator:
    def __init__(self):
        self.result = 0
    
    @classmethod
    def create(cls):
        return cls()
    
    async def compute(self):
        return self.result

def main():
    calc = Calculator()

Output structure:

before: []
after:
  - name: calculator.py
    type: file
    line_range: [1, 13]
    children:
      - name: Calculator
        type: class
        line_range: [1, 10]
        children:
          - name: __init__
            type: method
            line_range: [2, 3]
            children:
              - name: result
                type: member
                line_range: [3, 3]
          - name: "@classmethod create"
            type: method
            line_range: [5, 6]
          - name: async compute
            type: method
            line_range: [8, 9]
      - name: main
        type: function
        line_range: [12, 13]

Reference Extraction

The reference module can extract relationships between affected code nodes:

from diff_code_change_range.reference import extract_references, AffectedScope, AffectedNode, NodeType

before_code = {
    "com/example/Service.kt": '''
class Service {
    fun process() {
        validate()
    }
    fun validate(): Boolean {
        return true
    }
}
'''
}

after_code = {...}  # After version

scope = AffectedScope(
    before=[...],  # AffectedNode tree
    after=[...]
)

result = extract_references(before_code, after_code, scope)

# Access references
for ref in result.before_references:
    print(f"{ref.source} -> {ref.target} ({ref.type.value})")

# See what changed
for ref in result.added_references:
    print(f"Added: {ref.source} -> {ref.target}")

Supported reference types:

  • method_call - Method or function invocation
  • field_access - Field or property access
  • type_reference - Type usage in declarations
  • instantiation - Object creation
  • annotation - Annotation usage
  • inheritance - Class extends another class
  • implementation - Class implements interface

Output Format

The tool outputs YAML with before and after root keys:

before:
  - name: src/Calculator.java
    type: file
    line_range: [1, 15]
    children:
      - name: Calculator
        type: class
        line_range: [1, 14]
        children:
          - name: add
            type: method
            line_range: [5, 7]

after:
  - name: src/Calculator.java
    type: file
    line_range: [1, 15]
    children:
      - name: Calculator
        type: class
        line_range: [1, 14]
        children:
          - name: add
            type: method
            line_range: [5, 7]

Node Types

  • file - Source file
  • class - Class declaration
  • interface - Interface declaration
  • object - Kotlin object declaration
  • enum - Enum declaration
  • function - Top-level function (Kotlin, Python)
  • method - Class method
  • member - Field/property, class variable, instance variable
  • variable - Module-level variable (Python)

Node Fields

  • name - The name of the code element
  • type - The type of node (see above)
  • line_range - Array of [start_line, end_line] (1-based, inclusive)
  • children - List of child nodes (for container types)

Example

Input Diff

diff --git a/src/Calculator.java b/src/Calculator.java
index abc123..def456 100644
--- a/src/Calculator.java
+++ b/src/Calculator.java
@@ -1,8 +1,8 @@
 public class Calculator {
     private int result;
     
-    public void add(int a) {
-        result += a;
+    public void add(int a, int b) {
+        result = a + b;
     }
     
     public int getResult() {

Output

before:
  - name: src/Calculator.java
    type: file
    line_range: [1, 8]
    children:
      - name: Calculator
        type: class
        line_range: [1, 8]
        children:
          - name: add
            type: method
            line_range: [4, 6]

after:
  - name: src/Calculator.java
    type: file
    line_range: [1, 8]
    children:
      - name: Calculator
        type: class
        line_range: [1, 8]
        children:
          - name: add
            type: method
            line_range: [4, 6]

Development

Setup Development Environment

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install in development mode
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=diff_code_change_range

# Run specific test file
pytest tests/test_diff_parser.py

Project Structure

.
├── src/
│   └── diff_code_change_range/
│       ├── __init__.py           # Package exports
│       ├── __main__.py           # Module entry point
│       ├── cli.py                # CLI implementation
│       ├── diff_parser.py        # Diff parsing module
│       ├── structure_extractor.py # Tree-sitter code parsing
│       ├── affected_marker.py    # Affected node detection
│       └── yaml_reporter.py      # YAML output generation
├── tests/
│   ├── fixtures/                 # Test diff files
│   ├── test_diff_parser.py
│   ├── test_structure_extractor.py
│   ├── test_affected_marker.py
│   ├── test_yaml_reporter.py
│   └── test_e2e.py
├── pyproject.toml
├── requirements.txt
├── requirements-dev.txt
└── README.md

Supported Languages

  • Java (.java files) - Full support for classes, interfaces, enums, methods, fields
  • Kotlin (.kt files) - Full support for classes, objects, functions, properties
  • Python (.py files) - Full support for classes, functions, methods, decorators, async functions, module-level and instance variables

Other file types are automatically skipped.

Error Handling

The tool handles various error conditions gracefully:

  • Parse errors - Files that fail to parse are skipped with a warning to stderr
  • Binary files - Automatically detected and skipped
  • Non-Java/Kotlin/Python files - Silently skipped
  • Empty diffs - Produce empty output

Exit codes:

  • 0 - Success
  • 1 - Error (file not found, parse error, etc.)
  • 130 - Interrupted (Ctrl+C)

License

MIT License

Contributing

Contributions are welcome! Please ensure tests pass before submitting pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diff_code_change_range-0.0.1.tar.gz (33.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diff_code_change_range-0.0.1-py3-none-any.whl (25.9 kB view details)

Uploaded Python 3

File details

Details for the file diff_code_change_range-0.0.1.tar.gz.

File metadata

  • Download URL: diff_code_change_range-0.0.1.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for diff_code_change_range-0.0.1.tar.gz
Algorithm Hash digest
SHA256 eae6d40c0293ea01e9eeff172c9675ac402b211539436405ab65949ad608f7c9
MD5 92ca78f166a9987217b23495996802c9
BLAKE2b-256 5b62c05bca8c3cfe2df8c6fbd156f7022afc4f7810d1d811f998a1e3e2bacff3

See more details on using hashes here.

File details

Details for the file diff_code_change_range-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for diff_code_change_range-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 26c707e6147c0159469b5929459ea32ed80207a18d72945e8465847870bffb3a
MD5 541a6f89ba20b1418848b10564cdd776
BLAKE2b-256 347812f6c5fcae7fd95b7e3a5c6cbab7460495319963717c038e7540b2f4c19a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page