Skip to main content

Extract structured JSON of Git changes with PR and issue tracker integration

Project description

git-json-changes

Extract structured JSON of Git changes between two references (tags, branches, commits) with optional PR and issue tracker integration.


Installation

uv install git-json-changes
# or globally:
uv tool install git-json-changes

Optional: GitHub CLI

For PR and GitHub Issues support, install and authenticate the GitHub CLI:

# Install gh (see https://cli.github.com/)
gh auth login

Authentication Setup

GitHub Authentication

For PR and GitHub Issues support, you need a GitHub personal access token:

  1. Go to https://github.com/settings/tokens
  2. Click Generate new tokenGenerate new token (classic)
  3. Name it (e.g., "git-json-changes")
  4. Select scopes:
    • repo (for private repositories)
    • public_repo (for public repositories only)
  5. Click Generate token and copy it immediately

Set the token as environment variable:

export GITHUB_TOKEN="ghp_your_token_here"

Or pass it directly to the API:

result = generate_changes(..., github_token="ghp_your_token_here")

Jira Authentication

A. Personal Access Token (Server/Data Center) - Recommended

  1. Go to your Jira profile (click avatar) → ProfilePersonal Access Tokens
  2. Click Create token, name it (e.g., "git-json-changes")
  3. Set expiry date (optional)
  4. Copy the token immediately

Set as environment variables:

export JIRA_URL="https://jira.company.com"
export JIRA_PERSONAL_TOKEN="your_token_here"

B. API Token (Cloud)

For Atlassian Cloud instances:

  1. Go to https://id.atlassian.com/manage-profile/security/api-tokens
  2. Click Create API token, name it
  3. Copy the token immediately
  4. Use your email as username with the token as password

Set as environment variables:

export JIRA_URL="https://company.atlassian.net"
export JIRA_PERSONAL_TOKEN="your_api_token_here"

Note: For Cloud API tokens, authentication uses Bearer token format automatically.


Python API (Primary)

from git_json_changes import generate_changes

# Full output with all integrations
result = generate_changes(
    ref_from="v1.0.0",
    ref_to="v2.0.0",
    repo_path="/path/to/repo",      # local path, git URL, or None for cwd
    github_token=None,               # uses $GITHUB_TOKEN
    jira_url="https://jira.company.com",
    jira_token="...",                # uses $JIRA_PERSONAL_TOKEN if None
    fetch_prs=True,                  # fetch GitHub PRs
    fetch_github_issues=False,       # fetch GitHub Issues
    fetch_jira_from_prs=True,        # extract Jira refs from PR content
    issue_regex=r"[A-Z]+-\d+",       # regex to match issue keys
    diff_limit=50000,                # max bytes for diffs per commit
    pr_comment_limit=50000,          # max bytes for PR comments
    issue_limit=50000,               # max bytes for issue content
)

# result structure:
# {
#   "meta": {
#     "ref_from": "v1.0.0",
#     "ref_to": "v2.0.0",
#     "repository": "https://github.com/...",
#     "generated_at": "2025-12-11T...",
#     "stats": {
#       "commits": 42,
#       "prs": 12,
#       "pr_comments": 38,
#       "jira_issues": 8,
#       "jira_comments": 156,
#       "github_issues": 3
#     }
#   },
#   "pull_requests": [...],  # PRs with nested commits and issues
#   "orphan_commits": [...]  # commits not in any PR
# }

Convenience Functions

from git_json_changes import (
    get_commits,
    get_pull_requests,
    get_jira_issues,
    get_github_issues,
)

# Get commits only
commits = get_commits(repo, "v1.0", "v2.0", diff_limit=50000)

# Get Jira issues by keys
issues = get_jira_issues(
    ["PROJ-123", "PROJ-456"],
    jira_url="https://company.atlassian.net",
    jira_token="...",
)

CLI

git-json-changes v1.0.0 v2.0.0 -o changes.json

# With options
git-json-changes v1.0.0 v2.0.0 -o changes.json \
    --repo /path/to/repo \
    --jira-url https://company.atlassian.net \
    --jira-token $JIRA_TOKEN \
    --github-issues \
    --diff-limit 100000

Options

Option Default Description
-o, --output Required Output JSON file
-r, --repo Current dir Repository path or URL
--github-token $GITHUB_TOKEN GitHub token
--jira-url $JIRA_URL Jira instance URL
--jira-token $JIRA_TOKEN Jira API token
--issue-regex [A-Z]+-\d+ Regex for issue keys
--github-issues Off Enable GitHub Issues
--diff-limit 50000 Max bytes for diffs
--pr-comment-limit 50000 Max bytes for PR comments
--issue-limit 50000 Max bytes for issue content
--no-prs Off Skip PR fetching
--no-jira Off Skip Jira integration
--no-jira-from-prs Off Skip Jira extraction from PR content

Using Git URLs

You can pass a git URL to -r/--repo to clone and analyze remote repositories:

# SSH URL
git-json-changes v1.0.0 v2.0.0 -o output.json \
    -r git@github.com:owner/repo.git

# HTTPS URL
git-json-changes v1.0.0 v2.0.0 -o output.json \
    -r https://github.com/owner/repo.git

The repository will be cloned to a temporary directory and automatically cleaned up after analysis.


Output Structure

Top-Level Structure

The output is structured as dictionaries (not arrays) for O(1) lookup performance and bidirectional navigation:

{
  "meta": {
    "ref_from": "v1.0.0",
    "ref_to": "v2.0.0",
    "repository": "https://github.com/owner/repo.git",
    "generated_at": "2025-12-11T10:30:45.123456+00:00",
    "stats": {
      "commits": 42,
      "prs": 12,
      "pr_comments": 38,
      "jira_issues": 8,
      "jira_comments": 156,
      "github_issues": 3
    }
  },
  "pull_requests": {
    "123": {...},
    "124": {...}
  },
  "commits": {
    "abc123def456...": {...},
    "def789abc012...": {...}
  },
  "issues": {
    "PROJ-123": {...},
    "PROJ-456": {...},
    "gh-789": {...}
  }
}

Direct Access by ID

Access any entity directly by its ID:

# Get specific PR
pr = result['pull_requests'][123]

# Get specific commit
commit = result['commits']['abc123def456...']

# Get Jira issue
issue = result['issues']['PROJ-123']

# Get GitHub issue (prefixed with 'gh-')
gh_issue = result['issues']['gh-456']

Bidirectional References

The structure forms a navigable graph with bidirectional references:

Pull Request ←→ Commits ←→ Issues
     ↕                         ↕
  Commits                    PRs & Commits

Example navigation:

# Start with an issue, find all related work
issue = result['issues']['PROJ-123']
print(f"Issue: {issue['summary']}")

# Find commits
print(f"\nCommits ({len(issue['commits'])}):")
for commit_hash in issue['commits']:
    commit = result['commits'][commit_hash]
    print(f"  - {commit['short_hash']}: {commit['message'][:60]}")

# Find PRs
print(f"\nPull Requests ({len(issue['pull_requests'])}):")
for pr_number in issue['pull_requests']:
    pr = result['pull_requests'][pr_number]
    print(f"  - #{pr['number']}: {pr['title']}")

# Navigate from PR → commits → issues
pr = result['pull_requests'][123]
for commit_hash in pr['commits']:
    commit = result['commits'][commit_hash]
    for issue_id in commit['issues']:
        issue = result['issues'][issue_id]
        print(f"PR {pr['number']} → Commit {commit['short_hash']} → Issue {issue['key']}")

Pull Request Structure

Each PR is keyed by its number and contains references to related commits and issues:

"pull_requests": {
  "123": {
    "number": 123,
    "title": "Add new feature",
    "author": "username",
    "state": "merged",
    "url": "https://github.com/owner/repo/pull/123",
    "body": "Description of the PR...",
    "merge_commit": "abc123def456...",
    "comments": [
      {
        "author": "reviewer",
        "date": "2025-12-10T15:30:00Z",
        "body": "LGTM!"
      }
    ],
    "commits": [
      "abc123def456...",
      "def789abc012..."
    ],
    "issues": [
      "PROJ-123",
      "PROJ-456"
    ]
  }
}

Navigate to related entities:

pr = result['pull_requests'][123]

# Get all commits in this PR
for commit_hash in pr['commits']:
    commit = result['commits'][commit_hash]
    print(f"Commit: {commit['short_hash']} - {commit['message']}")

# Get all issues referenced in this PR
for issue_id in pr['issues']:
    issue = result['issues'][issue_id]
    print(f"Issue: {issue['key']} - {issue['summary']}")

Commit Structure

All commits are keyed by their full hash and contain references to their PR (if any) and related issues:

"commits": {
  "def456abc789...": {
    "hash": "def456abc789...",
    "short_hash": "def456a",
    "author": "Alice <alice@example.com>",
    "date": "2025-12-08T09:15:22+00:00",
    "message": "fix: resolve bug in parser\n\nFixes PROJ-456",
    "pr_number": 123,
    "issues": [
      "PROJ-456"
    ],
    "files": [
      {
        "path": "src/parser.py",
        "status": "modified",
        "additions": 3,
        "deletions": 1,
        "diff": "@@ -45,1 +45,3 @@\n-    old_code()\n+    new_code()\n+    additional_line()"
      },
      {
        "path": "tests/test_new.py",
        "status": "added",
        "additions": 120,
        "deletions": 0,
        "diff": "@@ -0,0 +1,120 @@\n+import unittest\n+..."
      }
    ]
  }
}

Notes:

  • pr_number is null for orphan commits (not in any PR), or contains the PR number if this commit is the merge commit for that PR
  • Currently, a commit can have at most one pr_number (we only track merge commits)
  • issues contains issue IDs (not full issue objects)
  • files array contains the full file change details

Navigate to related entities:

commit = result['commits']['def456abc789...']

# Get the PR this commit belongs to
if commit['pr_number']:
    pr = result['pull_requests'][commit['pr_number']]
    print(f"PR: #{pr['number']} - {pr['title']}")

# Get all issues referenced
for issue_id in commit['issues']:
    issue = result['issues'][issue_id]
    print(f"Issue: {issue['key']} - {issue['summary']}")

File Change Types

The status field in file objects can be:

  • "added" - New file created
  • "deleted" - File removed
  • "modified" - File changed
  • "renamed" - File moved/renamed

Issue Structure

Issues are keyed by their unique ID and contain reverse references to all commits and PRs that mention them:

Jira Issues (keyed by Jira key):

"issues": {
  "PROJ-123": {
    "source": "jira",
    "key": "PROJ-123",
    "url": "https://jira.company.com/browse/PROJ-123",
    "summary": "Issue title",
    "status": "In Progress",
    "description": "Full description...",
    "comments": [
      {
        "author": "Jane Smith",
        "date": "2025-12-09T10:15:00.000+0000",
        "body": "Comment text..."
      }
    ],
    "commits": [
      "abc123def456...",
      "def789abc012..."
    ],
    "pull_requests": [
      123,
      124
    ]
  }
}

GitHub Issues (keyed with 'gh-' prefix):

"issues": {
  "gh-456": {
    "source": "github",
    "number": 456,
    "url": "https://github.com/owner/repo/issues/456",
    "summary": "Bug report",
    "status": "open",
    "description": "Issue description...",
    "comments": [...],
    "commits": [
      "xyz789..."
    ],
    "pull_requests": []
  }
}

Navigate to related entities:

issue = result['issues']['PROJ-123']

# Get all commits that reference this issue
for commit_hash in issue['commits']:
    commit = result['commits'][commit_hash]
    print(f"Commit: {commit['short_hash']} - {commit['message']}")

# Get all PRs that reference this issue
for pr_number in issue['pull_requests']:
    pr = result['pull_requests'][pr_number]
    print(f"PR: #{pr['number']} - {pr['title']}")

Data Limits

To prevent excessive output size, byte limits are applied:

  • Diffs: 50KB per commit (smallest files first)
  • PR Comments: 50KB per PR (newest first)
  • Issue Content: 50KB per issue (description + newest comments first)

If content exceeds limits, it's truncated while preserving the most relevant data.


License

Proprietary. See LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

git_json_changes-0.2.3-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file git_json_changes-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for git_json_changes-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f662600a9ed66729d83ee05cc42fb5fcadac636a42500c13e5b7b2953dde2dbe
MD5 48495b7213248a27d626138fb561c258
BLAKE2b-256 125c4bb14c8e066b99a8f834f8c2bbbb47f5aa740c0eb877c8012125f685f9af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page