Evidence-based repository summarizer that generates comprehensive documentation cards from any GitHub project.
Project description
RepoCards
Evidence-based repository summarizer that works on any GitHub project.
RepoCards automatically analyzes GitHub repositories and generates comprehensive documentation cards in both Markdown and JSON formats. Perfect for understanding new projects, building developer tools, or creating automated documentation systems.
Quick Start
Installation
pip install repocards
Basic Usage
Command Line
Analyze any GitHub repository:
repocards summarize https://github.com/owner/repo
Save outputs to files:
repocards summarize https://github.com/owner/repo --out-dir _out
# Creates: _out/card.md and _out/card.json
Customize output filenames:
repocards summarize https://github.com/owner/repo --out-dir _out --out-stem myproject
# Creates: _out/myproject.md and _out/myproject.json
Python API
Use programmatically in your code:
import repocards
# Get markdown string
markdown = repocards.get_repo_info("https://github.com/owner/repo")
# Get pydantic object
card = repocards.get_repo_info("https://github.com/owner/repo", mode="pydantic")
print(card.title)
# Save to files
path = repocards.get_repo_info(
"https://github.com/owner/repo",
mode="markdown_file",
out_dir="./output"
)
Command Options
--out-dir PATH– Target directory for output files (auto-creates if needed)--out-stem NAME– Base filename without extension (e.g.,myproject→myproject.md+myproject.json)--out-md PATH– Exact path for Markdown output--out-json PATH– Exact path for JSON output--max-files N– Maximum number of files to fetch (default: 160)
What RepoCards Extracts
RepoCards analyzes your repository and automatically extracts:
📊 Quick Facts
- Primary programming languages and their usage
- Detected ecosystems (Python, Node.js, CMake, etc.)
- License and topics
🔧 Capabilities
- Package names extracted from installation commands
- Entry points (CLI commands defined in manifests)
- API/CLI availability detection
- Dockerfile presence and containerization support
- OS support inferred from commands (Linux/macOS/Windows)
- Model weights and dataset links (for ML projects)
📝 Commands by Category
Auto-discovered shell commands organized by:
- Install – Package managers and dependencies
- Setup – Environment configuration
- Build – Compilation and build steps
- Run – Execution commands
- Test – Testing frameworks
- Lint – Code quality tools
All commands are categorized by OS (Linux/macOS/Windows/Generic) with source attribution.
🚀 Canonical Quickstart
Auto-generated step-by-step quickstart guide per OS, intelligently selecting the most relevant commands from documentation and CI workflows.
🔗 Additional Information
- Overview from README
- Python API usage examples
- Helpful links (documentation, wikis, releases)
- Notable files and directories
- Imaging-specific signals (for medical/scientific imaging projects)
Output Format
Markdown Card
The generated Markdown file includes:
- Repository metadata (license, topics, languages)
- Overview extracted from README
- Quick facts about languages and ecosystems
- Capability facts (packages, entry points, OS support, etc.)
- Canonical quickstart commands organized by OS
- Python API examples (if found)
- Helpful links with source attribution
- Notable files and directories
JSON Card Structure
{
"repo_url": "https://github.com/owner/repo",
"ref": "main",
"title": "owner/repo",
"meta": {
"license": "MIT",
"topics": ["python", "data-science"],
"languages": {"Python": 50000, "JavaScript": 10000}
},
"markdown": "...", // Full markdown content
"extras": {
"ecosystems": ["python", "node"],
"capabilities": {
"entrypoints": ["myapp = mypackage.cli:main"],
"provides_api": true,
"provides_cli": true,
"dockerfile_present": true,
"package_names": ["numpy", "pandas"],
"os_support": ["linux", "macos"],
"model_weight_links": ["https://huggingface.co/..."],
"dataset_links": ["https://zenodo.org/..."],
"buckets_by_os": {
"install": {
"linux": [{"cmd": "apt install...", "source": ".github/..."}],
"macos": [...],
"windows": [...],
"generic": [...]
},
"build": {...},
"run": {...},
"test": {...},
"lint": {...}
}
},
"quickstart": {
"linux": [{"cmd": "pip install .", "source": "README.md"}],
"macos": [...],
"windows": [...],
"generic": [...]
},
"imaging": {
"imaging_score": 0.80,
"python_libs": ["pydicom", "nibabel"],
"file_types": [".dcm", ".nii"],
"tasks": ["segmentation", "registration"],
"modalities": ["CT", "MRI"]
}
}
}
Note: All commands and links include provenance (source file path) for transparency.
Programmatic Usage
Simple API
RepoCards provides a simple API for programmatic access:
import repocards
# Get markdown string (default) - token auto-loaded from .env
markdown = repocards.get_repo_info("https://github.com/owner/repo")
print(markdown)
# Get JSON string
json_str = repocards.get_repo_info("https://github.com/owner/repo", mode="json")
# Get pydantic object for structured access
card = repocards.get_repo_info("https://github.com/owner/repo", mode="pydantic")
print(card.title)
print(card.meta["license"])
print(card.extras["ecosystems"])
# Write to markdown file
path = repocards.get_repo_info(
"https://github.com/owner/repo",
mode="markdown_file",
out_dir="./output"
)
print(f"Wrote to: {path}")
# Write to JSON file
path = repocards.get_repo_info(
"https://github.com/owner/repo",
mode="json_file",
out_dir="./output"
)
print(f"Wrote to: {path}")
# Control file fetching limit
card = repocards.get_repo_info(
"https://github.com/owner/repo",
mode="pydantic",
max_files=100
)
Available Modes
| Mode | Returns | Description |
|---|---|---|
"markdown" |
str |
Markdown content (default) |
"json" |
str |
JSON string |
"pydantic" |
RepoCard |
Pydantic model object |
"markdown_file" |
str |
Writes file, returns path |
"json_file" |
str |
Writes file, returns path |
GitHub Authentication
Authentication is automatic! Just create a .env file in your project root:
# .env file
GITHUB_TOKEN=ghp_your_token_here
The token is automatically loaded from .env or environment variables.
Rate Limits:
- Without token: 60 requests/hour
- With token: 5,000 requests/hour
Get a GitHub token:
- Go to https://github.com/settings/tokens
- Generate a new token (classic) with
reposcope - Add it to your
.envfile
Alternative: Export as environment variable
export GITHUB_TOKEN="ghp_your_token_here"
How It Works
Intelligent File Selection
RepoCards fetches a curated subset of repository files:
- Documentation (README, docs/, etc.)
- Package manifests (pyproject.toml, package.json, CMakeLists.txt, etc.)
- CI workflows (.github/workflows/)
- Example scripts and demos
- Docker configurations
This selective approach keeps analysis fast while gathering comprehensive information.
Command Harvesting
Commands are extracted from:
- Fenced shell blocks in Markdown (
bash,sh, etc.) - Shell prompts ($-prefixed lines in documentation)
- CI workflows (run: steps in GitHub Actions)
OS Classification
Commands are automatically classified by operating system:
- Linux: apt, dnf, pacman package managers
- macOS: brew, CMake OSX flags
- Windows: choco, winget, msbuild, PowerShell
- Generic: Cross-platform commands
Package Name Extraction
Intelligently parses installation commands to extract package names:
- Filters out
-r requirements.txtand similar flags - Removes URLs, local paths, and version specifiers
- Strips extras (e.g.,
package[dev]→package)
Python Code Detection
Extracts Python API examples from fenced code blocks:
- Validates code contains real Python (imports/definitions/calls)
- Limits to relevant, instructive snippets
- Filters out empty or trivial examples
Domain-Specific Analysis
Imaging Analyzer (optional, gated by relevance):
- Detects medical/scientific imaging projects
- Identifies Python libraries (pydicom, nibabel, SimpleITK, etc.)
- Recognizes file formats (.dcm, .nii, .mha, etc.)
- Classifies tasks (segmentation, registration, etc.)
- Identifies modalities (MRI, CT, PET, etc.)
Development Setup
Clone and install in editable mode:
git clone https://github.com/qchapp/repocards
cd repocards
pip install -e .
Run tests:
pytest tests/
Design Philosophy
General-Purpose
Works on any GitHub repository without per-project configuration or YAML rules.
Evidence-Based
Every extracted command and fact includes source file attribution. No invented or assumed information.
Agent-Ready
Structured JSON output with machine-readable facts enables:
- Automated documentation systems
- Developer tools and IDE integrations
- AI agents that need to understand codebases
- Repository analysis pipelines
Reliable
- Verbatim commands from actual documentation
- No hallucination or inference beyond what's in the repository
- Clear provenance for all extracted information
Use Cases
- 📚 Documentation Generation: Automatically create comprehensive repo cards
- 🤖 AI/Agent Tools: Provide structured repository information to AI systems
- 🔍 Code Discovery: Quickly understand unfamiliar projects
- 📊 Repository Analysis: Batch analyze multiple repositories
- 🛠️ Developer Tooling: Build IDE extensions or CLI tools that need repo metadata
- 🏥 Domain Analysis: Identify imaging, ML, or other domain-specific projects
License
MIT
Contributing
Contributions welcome! Please feel free to submit issues or pull requests.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repocards-0.1.1.tar.gz.
File metadata
- Download URL: repocards-0.1.1.tar.gz
- Upload date:
- Size: 32.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
969b3cf08e5ec60c0ed06365a6a8d9a6e4eeea8cc4bedbb9616387ace1741a21
|
|
| MD5 |
2ef8be2c0f01a9467fbac9b6d9d9b19e
|
|
| BLAKE2b-256 |
10a8086650c6bb0244c2314d98185edb6ea20566f8573cea78a3e30f62dadc8f
|
File details
Details for the file repocards-0.1.1-py3-none-any.whl.
File metadata
- Download URL: repocards-0.1.1-py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53ba160d7d4b1bcdaa14c121c417aa464912227401634845be8ef8f462bc183a
|
|
| MD5 |
5dd276ef22d6f6d7abc853909253e6fd
|
|
| BLAKE2b-256 |
054028cb92c4c57c6098a7010d755d8e5c4ad97ed8a9537c6617debddbe4c03a
|