Fetch any GitHub repository into a single LLM-ready digest — metadata, directory tree, file contents, and built-in AI analysis.

These details have not been verified by PyPI

Project links

Project description

RepoDigest

Fetch any GitHub repository via the GitHub API and produce an LLM-ready text digest and a structured JSON file — including the repo's "about" metadata, full directory tree, and all file contents. Then optionally pass the digest directly to an LLM for deep analysis.

Installation

pip install ghdigest          # core — fetch + CLI + web UI
pip install ghdigest[llm]     # + Groq, Mistral, OpenRouter, Gemini analysis

PyPI: https://pypi.org/project/ghdigest/

Features

Repo "about" metadata: description, homepage, topics, stars, forks, watchers, license, language
Full recursive directory tree rendered as ASCII art
All text file contents concatenated with clear separators
Automatic filtering of binary files and noise directories
Configurable max file size (100 KB → No limit)
Web UI with live progress bar, results tabs, and download buttons
Browse User Repos panel — list all public repos for any GitHub user, with one-click ingest
LLM Analysis panel — send the digest to Groq, Mistral, OpenRouter, or Gemini with 5 preset prompts
CLI, importable Python library, and FastAPI server
GitHub token support (raises rate limit from 60 to 5,000 req/h)
Outputs both .txt (LLM digest) and .json (structured data)

Quick Start

# Start the web UI
uvicorn github_ingest.server:app --reload --port 8001

# Open in browser
http://localhost:8001

Web UI

Visit http://localhost:8001 after starting the server.

Repo Fetch panel

Enter any owner/repo or full GitHub URL
Optional: GitHub token, branch, max file size (including No limit)
Live progress bar while ingesting
Results: about card (description, topics, stats), summary, directory tree
Switch between TXT digest / JSON / File tree tabs
Copy to clipboard or download .txt / .json

Browse User Repos panel

Enter any GitHub username and hit Browse
Filter by type: Own repos (no forks) or All (includes forks)
Sort by: last updated, created, last pushed, or name A–Z
Each repo card shows: name, description, language (colour-coded), stars, topics, license, last-updated time ago
Click Ingest → on any card to instantly populate the ingest form and fetch that repo's full digest
Token entered here is automatically copied to the ingest form

LLM Analysis panel (below results)

Select a provider, paste your API key, pick a model
Choose an analysis type or write a custom prompt
Response streams live into the browser
Copy the analysis to clipboard

LLM Providers

All providers below offer a free tier with no credit card required (except Gemini in some regions).

Provider	Free?	Context	Sign-up
Groq ⭐ recommended	Free, no card	128k tokens	console.groq.com
Mistral	Free, no card	32k tokens	console.mistral.ai
OpenRouter	Free models, no card	up to 131k	openrouter.ai/keys
Gemini	Free (may need billing)	1M tokens	aistudio.google.com/apikey

Analysis presets

Preset	What it covers
Summary	Purpose, architecture, key modules, dependencies, entry points
Architecture	Design patterns, module interactions, data flow, coupling
Security Review	Vulnerabilities, input validation, secrets, findings by severity
Onboarding Guide	Setup, structure tour, how to run, common tasks, gotchas
Code Quality	Score 1–10, organisation, docs, tests, duplication, top improvements
Custom	Write your own prompt

CLI Usage

# Basic — writes owner_repo.txt and owner_repo.json in current directory
python -m github_ingest owner/repo

# Full GitHub URL works too
python -m github_ingest https://github.com/owner/repo

# Authenticate (recommended — raises rate limit from 60 to 5,000 req/h)
python -m github_ingest owner/repo --token ghp_xxxxxxxxxxxx

# Or use an environment variable
export GITHUB_TOKEN=ghp_xxxxxxxxxxxx
python -m github_ingest owner/repo

# Specific branch
python -m github_ingest owner/repo --branch dev

# Custom output directory
python -m github_ingest owner/repo --output ./digests

# Limit file size (skip files larger than 100 KB)
python -m github_ingest owner/repo --max-file-size 102400

# No file size limit
python -m github_ingest owner/repo --max-file-size 0

# Only produce the JSON file
python -m github_ingest owner/repo --no-txt

# Print the text digest to stdout
python -m github_ingest owner/repo --stdout

# Suppress all progress output
python -m github_ingest owner/repo --quiet

Output files are named {owner}_{repo}.txt and {owner}_{repo}.json.

Python Library Usage

from github_ingest import ingest, to_txt, to_json, to_dict

# Fetch the repo
result = ingest("owner/repo", token="ghp_xxx")

# Access structured data
print(result.about["description"])
print(result.about["topics"])
print(result.about["stars"])
print(result.tree)                     # ["README.md", "src/main.py", ...]
print(result.files["README.md"])       # raw file content

# Render outputs
txt_digest  = to_txt(result)           # LLM-ready plain text
json_string = to_json(result)          # pretty-printed JSON string
data_dict   = to_dict(result)          # plain Python dict

# No file size limit
result = ingest("owner/repo", max_file_size=0)

LLM analysis from Python

from github_ingest import ingest, to_txt
from github_ingest.analyzer import analyze_stream

result = ingest("owner/repo", token="ghp_xxx")
digest = to_txt(result)

# Stream analysis using Groq (free, no card)
for chunk in analyze_stream(
    digest=digest,
    api_key="gsk_...",
    provider="groq",
    prompt_type="summary",
    model_name="llama-3.3-70b-versatile",
):
    print(chunk, end="", flush=True)

List all repos for a user

from github_ingest import fetch_user_repos

# All public repos the user created (no forks), sorted by last updated
repos = fetch_user_repos("some-user", token="ghp_xxx")

for repo in repos:
    print(repo["name"], "|", repo["description"])

# Include forks, sort by stars
repos = fetch_user_repos("some-user", token="ghp_xxx", repo_type="all", sort="updated")

Each item contains: name, full_name, description, html_url, language, stars, forks, fork, topics, license, updated_at, created_at, visibility, default_branch.

Pair with ingest() to process an entire profile in one loop:

from github_ingest import fetch_user_repos, ingest, to_txt

repos = fetch_user_repos("some-user", token="ghp_xxx")

for repo in repos:
    result = ingest(repo["full_name"], token="ghp_xxx")
    digest = to_txt(result)
    # → pass to LLM, save to disk, etc.

Individual fetchers

from github_ingest import fetch_repo_about, fetch_repo_tree, fetch_blob_content

about   = fetch_repo_about("owner", "repo", token="ghp_xxx")
blobs   = fetch_repo_tree("owner", "repo", sha="<commit-sha>", token="ghp_xxx")
content = fetch_blob_content("owner", "repo", blob_sha="<sha>", token="ghp_xxx")

FastAPI Server

Start

uvicorn github_ingest.server:app --reload --port 8001

Set GITHUB_TOKEN in the environment to apply a default token to all ingest requests:

export GITHUB_TOKEN=ghp_xxxxxxxxxxxx
uvicorn github_ingest.server:app --reload --port 8001

Endpoints

Method	Path	Description
`GET`	`/`	Web UI
`GET`	`/health`	Liveness check
`POST`	`/ingest`	Ingest repo → JSON
`POST`	`/ingest/txt`	Ingest repo → plain-text digest
`GET`	`/ingest?repo=owner/repo&fmt=json`	Ingest via query params
`GET`	`/users/{username}/repos`	List all public repos for a user
`GET`	`/analyze/providers`	List providers, models, prompt types
`POST`	`/analyze`	Analyze digest → SSE stream
`POST`	`/analyze/test`	Test an API key with a minimal request

POST /ingest

{
  "repo": "owner/repo",
  "token": "ghp_xxxxxxxxxxxx",
  "branch": "main",
  "max_file_size": 524288
}

Set max_file_size to 0 for no limit. token falls back to the GITHUB_TOKEN env var.

GET /users/{username}/repos

GET /users/assem-elqersh/repos
GET /users/assem-elqersh/repos?type=all&sort=updated&token=ghp_xxx

Query param	Default	Options
`type`	`owner`	`owner` (no forks), `all` (includes forks)
`sort`	`updated`	`updated`, `created`, `pushed`, `full_name`
`token`	—	GitHub PAT (falls back to `GITHUB_TOKEN` env var)

Returns an array of repo objects with: name, full_name, description, html_url, language, stars, forks, fork, topics, license, updated_at, created_at, visibility, default_branch.

POST /analyze

{
  "digest": "<your .txt digest>",
  "api_key": "gsk_...",
  "provider": "groq",
  "prompt_type": "summary",
  "model": "llama-3.3-70b-versatile"
}

provider options: groq · mistral · openrouter · gemini
prompt_type options: summary · architecture · security · onboarding · quality · custom
Returns a Server-Sent Events stream: data: {"text": "..."} chunks, ending with data: [DONE].

POST /analyze/test

{ "api_key": "gsk_...", "provider": "groq", "model": "llama-3.3-70b-versatile" }

Sends a minimal request to verify the key works before sending the full digest.

Interactive API docs

http://localhost:8001/docs

Output Format

`.txt` digest

================================================================
REPOSITORY: owner/repo  [main]
================================================================

ABOUT
-----
Description : A great project
Homepage    : https://example.com
Language    : Python
License     : MIT
Topics      : python, llm, tools
Stats       : Stars: 1,234 | Forks: 56 | Watchers: 89 | Open issues: 12

================================================================
SUMMARY
================================================================
Files ingested : 42
Files skipped  : 8

================================================================
DIRECTORY STRUCTURE
================================================================
repo/
├── README.md
├── src/
│   ├── main.py
│   └── utils.py
└── tests/
    └── test_main.py

================================================================
FILES
================================================================
──── README.md ────────────────────────────────────────────────
# My Project
...

`.json` output

{
  "repository": "owner/repo",
  "branch": "main",
  "about": {
    "description": "A great project",
    "topics": ["python", "llm"],
    "stars": 1234,
    "license": "MIT"
  },
  "summary": { "files_ingested": 42, "files_skipped": 8 },
  "tree": ["README.md", "src/main.py"],
  "files": { "README.md": "# My Project\n..." },
  "skipped": ["assets/logo.png"]
}

GitHub Rate Limits

	Unauthenticated	With token
Limit	60 req/hour	5,000 req/hour
Small repo (~20 files)	Works	Works
Medium repo (~200 files)	Hits limit	Works
Large repo (1000+ files)	Fails	Works

Get a free token at github.com/settings/tokens/new — no scopes needed for public repos.

Filtered by Default

Binary extensions: images, audio, video, archives, compiled objects, fonts, office docs, databases

Noise directories: .git, node_modules, __pycache__, .venv, venv, dist, build, .next, .pytest_cache, and more

Files over 500 KB are skipped by default. Override with --max-file-size <bytes> or 0 for no limit.

License

MIT License. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Apr 15, 2026

1.0.1

Apr 15, 2026

1.0.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ghdigest-1.1.0.tar.gz (36.9 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ghdigest-1.1.0-py3-none-any.whl (40.3 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file ghdigest-1.1.0.tar.gz.

File metadata

Download URL: ghdigest-1.1.0.tar.gz
Upload date: Apr 15, 2026
Size: 36.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for ghdigest-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`71a3bbb90a880a8663e1613000e82798557d79771f514abb8f89f6a388cfab53`
MD5	`eeaf9a210aaf1ee85b9befafcf75f781`
BLAKE2b-256	`6c51a72f21e0d711491d2b22d6a4a4362320f471104f4549b12e7025e36ea28b`

See more details on using hashes here.

File details

Details for the file ghdigest-1.1.0-py3-none-any.whl.

File metadata

Download URL: ghdigest-1.1.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 40.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for ghdigest-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fc62c00d0fde00bf6266a43d8a0345921be749d36ca77b8fb875a6af29cc1282`
MD5	`832fcdaaa106bbea50e6f6ad16cd01d2`
BLAKE2b-256	`34aad3ccc54e28e654a60318bcad8397bf277d470c6cd2d0c4ef130ec43444ea`

See more details on using hashes here.

ghdigest 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RepoDigest

Installation

Features

Quick Start

Web UI

Repo Fetch panel

Browse User Repos panel

LLM Analysis panel (below results)

LLM Providers

Analysis presets

CLI Usage

Python Library Usage

LLM analysis from Python

List all repos for a user

Individual fetchers

FastAPI Server

Start

Endpoints

POST /ingest

GET /users/{username}/repos

POST /analyze

POST /analyze/test

Interactive API docs

Output Format

.txt digest

.json output

GitHub Rate Limits

Filtered by Default

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`.txt` digest

`.json` output