Skip to main content

Git history analyzer with LLM-powered narrative generation

Project description

GitView

Git history analyzer with LLM-powered narrative generation

GitView extracts your repository's git history and uses AI to generate compelling narratives about how your codebase evolved. Instead of manually reading through thousands of commits, get a comprehensive story of your project's journey.

Example run on this repository:

[(https://github.com/carstenbund/gitview/blob/main/output/history_story.md)]

Features

  • ** Comprehensive History Extraction**: Extracts commit metadata, LOC changes, language breakdown, README evolution, comment analysis, and more
  • ** Smart Chunking**: Automatically divides history into meaningful "phases" or "epochs" based on significant changes
  • ** LLM-Powered Summaries**: Uses Claude to generate narrative summaries for each phase
  • ** Global Story Generation**: Combines phase summaries into executive summaries, timelines, technical retrospectives, and deletion stories
  • ** Multiple Output Formats**: Generates markdown reports, JSON data, and timelines

Installation

Option 1: Install with pip (recommended)

This creates a gitview command in your PATH:

# Clone the repository
git clone https://github.com/yourusername/gitview.git
cd gitview

# Install in editable mode with dependencies
pip install -e .

# The gitview command is now available system-wide
gitview --version
gitview --help

How it works: The pip install -e . command reads pyproject.toml and setup.py, which define an entry point that creates /usr/local/bin/gitview (or similar on Windows) that calls gitview.cli:main.

Option 2: Run directly from repo (no installation)

Use the executable wrapper in bin/:

# Clone the repository
git clone https://github.com/yourusername/gitview.git
cd gitview

# Install dependencies only
pip install -r requirements.txt

# Run directly from the repo
./bin/gitview --version
./bin/gitview analyze

# Or add bin/ to your PATH
export PATH="$PWD/bin:$PATH"
gitview analyze

Option 3: Run as Python module

# Install dependencies
pip install -r requirements.txt

# Run as a module
python -m gitview.cli --help
python -m gitview.cli analyze

Verify Installation

Run the verification script to check everything is set up correctly:

python verify_installation.py

This will check:

  • Python version (3.8+ required)
  • All required dependencies
  • gitview command availability
  • LLM backend configuration (API keys, Ollama server)

Troubleshooting Installation

If gitview command is not found after installation:

# Option 1: Use full path to module
python -m gitview.cli analyze

# Option 2: Reinstall in editable mode
pip uninstall gitview -y
pip install -e .

# Option 3: Check if it's in your PATH
which gitview  # Unix/Linux/Mac
where gitview  # Windows

Quick Start

# Using Anthropic Claude (default)
export ANTHROPIC_API_KEY="your-api-key-here"
gitview analyze

# Using OpenAI GPT
export OPENAI_API_KEY="your-api-key-here"
gitview analyze --backend openai

# Using local Ollama (no API key needed)
gitview analyze --backend ollama --model llama3

# Skip LLM summarization (just extract and chunk)
gitview analyze --skip-llm

Usage

Full Analysis Pipeline

The main command runs the complete pipeline: extract → chunk → summarize → story → output

gitview analyze [OPTIONS]

Options:
  -r, --repo PATH              Path to git repository (default: current directory)
  -o, --output PATH            Output directory (default: "output")
  -s, --strategy STRATEGY      Chunking strategy: fixed, time, or adaptive (default: adaptive)
  --chunk-size INTEGER         Chunk size for fixed strategy (default: 50)
  --max-commits INTEGER        Maximum commits to analyze
  --branch TEXT                Branch to analyze (default: HEAD)
  -b, --backend BACKEND        LLM backend: anthropic, openai, or ollama (auto-detected)
  -m, --model TEXT             Model identifier (uses backend defaults if not specified)
  --api-key TEXT               API key for the backend (defaults to env var)
  --ollama-url TEXT            Ollama API URL (default: http://localhost:11434)
  --repo-name TEXT             Repository name for output
  --skip-llm                   Skip LLM summarization (extract and chunk only)

Extract Only

Extract git history to JSONL file without LLM processing:

gitview extract --repo /path/to/repo --output history.jsonl

Chunk Only

Chunk an extracted JSONL file into phases:

gitview chunk history.jsonl --output ./phases --strategy adaptive

Chunking Strategies

GitView supports three chunking strategies:

1. Adaptive (Recommended)

Automatically splits history when significant changes occur:

  • LOC changes by >30%
  • Large deletions/additions detected
  • README rewrites
  • Major refactorings
gitview analyze --strategy adaptive

2. Fixed Size

Splits history into fixed-size chunks (e.g., 50 commits per phase):

gitview analyze --strategy fixed --chunk-size 50

3. Time-Based

Splits by time periods (week, month, quarter, year):

gitview analyze --strategy time --period quarter

Output Files

GitView generates several output files:

output/
├── repo_history.jsonl           # Raw commit data
├── phases/                       # Phase data
│   ├── phase_01.json
│   ├── phase_02.json
│   └── phase_index.json
├── history_story.md              # Main narrative report
├── timeline.md                   # Simple timeline
└── history_data.json             # Complete data in JSON

Main Report (history_story.md)

Contains:

  • Executive Summary: High-level overview for stakeholders
  • Timeline: Chronological phases with descriptive headings
  • Full Narrative: Complete story of the codebase evolution
  • Technical Evolution: Architectural journey and key decisions
  • Story of Deletions: What was removed and why
  • Phase Details: Detailed breakdown of each phase
  • Statistics: Comprehensive metrics

How It Works

Phase 1: Extract Raw History

Analyzes git commits and extracts:

  • Commit metadata (hash, author, date, message)
  • Lines of code changes (insertions/deletions)
  • File statistics
  • Language breakdown
  • README state and changes
  • Code comments and density
  • Detection of large changes, refactors, etc.

Phase 2: Chunk into Epochs

Divides history into meaningful phases based on:

  • Significant LOC changes
  • Large deletions or additions
  • Language mix changes
  • README rewrites
  • Major refactorings

Phase 3: Summarize Each Phase

Uses Claude to generate narrative summaries for each phase, answering:

  • What were the main activities?
  • Why were changes made?
  • What was deleted/added and why?
  • How did documentation evolve?
  • What do commit messages reveal?

Phase 4: Generate Global Story

Combines phase summaries to create:

  • Executive summary for non-technical readers
  • Chronological timeline with meaningful headings
  • Technical retrospective
  • Story of code deletions and cleanups
  • Full detailed narrative

Examples

Analyze a Large Open Source Project

gitview analyze \
  --repo /path/to/large-project \
  --output ./project-analysis \
  --strategy adaptive \
  --repo-name "My Project"

Quick Analysis Without LLM

Perfect for quick exploration or when you don't have an API key:

gitview analyze --skip-llm --output ./quick-analysis

Extract and Process Later

# Extract once
gitview extract --repo /path/to/repo --output history.jsonl

# Experiment with different chunking strategies
gitview chunk history.jsonl --strategy adaptive --output ./adaptive-phases
gitview chunk history.jsonl --strategy fixed --chunk-size 25 --output ./fixed-phases

Architecture

┌─────────────────────┐
│   Git Repository    │
└──────────┬──────────┘
           │
           v
┌─────────────────────┐
│  Extractor          │  Analyzes commits, extracts metadata
│  (extractor.py)     │  Output: repo_history.jsonl
└──────────┬──────────┘
           │
           v
┌─────────────────────┐
│  Chunker            │  Splits into meaningful phases
│  (chunker.py)       │  Strategies: adaptive, fixed, time
└──────────┬──────────┘
           │
           v
┌─────────────────────┐
│  Summarizer         │  LLM summarizes each phase
│  (summarizer.py)    │  Uses Claude API
└──────────┬──────────┘
           │
           v
┌─────────────────────┐
│  StoryTeller        │  Generates global narratives
│  (storyteller.py)   │  Multiple story formats
└──────────┬──────────┘
           │
           v
┌─────────────────────┐
│  Writer             │  Outputs markdown, JSON, etc.
│  (writer.py)        │
└─────────────────────┘

Requirements

  • Python 3.8+
  • Git repository with commit history
  • One of the following LLM backends:
    • Anthropic Claude (requires API key)
    • OpenAI GPT (requires API key)
    • Ollama (runs locally, no API key needed)
  • Dependencies: gitpython, anthropic, openai, requests, click, rich, pydantic

LLM Backend Configuration

GitView supports three LLM backends with automatic detection based on environment variables:

Anthropic Claude (Default)

Get an API key from Anthropic

export ANTHROPIC_API_KEY="your-api-key-here"
gitview analyze

Default models:

  • claude-sonnet-4-5-20250929 (default)
  • claude-3-opus-20240229 (more powerful)
  • claude-3-haiku-20240307 (faster)

OpenAI GPT

Get an API key from OpenAI

export OPENAI_API_KEY="your-api-key-here"
gitview analyze --backend openai

Default models:

  • gpt-4 (default)
  • gpt-4-turbo-preview (faster)
  • gpt-3.5-turbo (cheaper)

Ollama (Local)

Install Ollama and pull a model:

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3

# Start Ollama server
ollama serve

# Use with GitView (no API key needed)
gitview analyze --backend ollama --model llama3

Popular Ollama models:

  • llama3 (default, balanced)
  • mistral (fast, good quality)
  • codellama (optimized for code)
  • mixtral (large, powerful)

Custom Configuration

# Specify custom model
gitview analyze --backend anthropic --model claude-3-opus-20240229

# Use custom Ollama URL
gitview analyze --backend ollama --ollama-url http://192.168.1.100:11434

# Pass API key directly (instead of env var)
gitview analyze --backend openai --api-key "your-key"

Use Cases

  • Technical Documentation: Automatically generate project history documentation
  • Onboarding: Help new developers understand codebase evolution
  • Retrospectives: Review what worked and what didn't
  • Project Reports: Create compelling narratives for stakeholders
  • Code Archaeology: Understand why code evolved the way it did
  • Cleanup Planning: Identify what to remove based on deletion history

Contributing

Contributions welcome! Please open an issue or submit a pull request.

License

MIT License - see LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gitview-0.1.1.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gitview-0.1.1-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file gitview-0.1.1.tar.gz.

File metadata

  • Download URL: gitview-0.1.1.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for gitview-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d2fe356f498e1b8de508279b4f111bde47dd108c6f74ecfa6a29e262e66b9f84
MD5 635199028b953725da78b378a6ff3b33
BLAKE2b-256 fec01d28be55d5d7c8168a8390a1392862f31b76e12a7a24e0565e20df9337c8

See more details on using hashes here.

File details

Details for the file gitview-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: gitview-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for gitview-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ecd9fe593e6fb547c9ac7b3e36a484b603ba402368518bfd612f5b0061386d54
MD5 25666e23483ad97efabbb7796c2620e5
BLAKE2b-256 4c0de43f00b0d8519b97f3fb22eebf8169a7e7127f27e17c07e2f3b1e1019904

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page