Skip to main content

Evidence-based repository summarizer that generates comprehensive documentation cards from any GitHub project.

Project description

RepoCards

Evidence-based repository summarizer that works on any GitHub project.

RepoCards automatically analyzes GitHub repositories and generates comprehensive documentation cards in both Markdown and JSON formats. Perfect for understanding new projects, building developer tools, or creating automated documentation systems.


Quick Start

Installation

pip install repocards

Basic Usage

Command Line

Analyze any GitHub repository:

repocards summarize https://github.com/owner/repo

Save outputs to files:

repocards summarize https://github.com/owner/repo --out-dir _out
# Creates: _out/card.md and _out/card.json

Customize output filenames:

repocards summarize https://github.com/owner/repo --out-dir _out --out-stem myproject
# Creates: _out/myproject.md and _out/myproject.json

Python API

Use programmatically in your code:

import repocards

# Get markdown string
markdown = repocards.get_repo_info("https://github.com/owner/repo")

# Get pydantic object
card = repocards.get_repo_info("https://github.com/owner/repo", mode="pydantic")
print(card.title)

# Save to files
path = repocards.get_repo_info(
    "https://github.com/owner/repo",
    mode="markdown_file",
    out_dir="./output"
)

Command Options

  • --out-dir PATH – Target directory for output files (auto-creates if needed)
  • --out-stem NAME – Base filename without extension (e.g., myprojectmyproject.md + myproject.json)
  • --out-md PATH – Exact path for Markdown output
  • --out-json PATH – Exact path for JSON output
  • --max-files N – Maximum number of files to fetch (default: 160)

What RepoCards Extracts

RepoCards analyzes your repository and automatically extracts:

📊 Quick Facts

  • Primary programming languages and their usage
  • Detected ecosystems (Python, Node.js, CMake, etc.)
  • License and topics

🔧 Capabilities

  • Package names extracted from installation commands
  • Entry points (CLI commands defined in manifests)
  • API/CLI availability detection
  • Dockerfile presence and containerization support
  • OS support inferred from commands (Linux/macOS/Windows)
  • Model weights and dataset links (for ML projects)

📝 Commands by Category

Auto-discovered shell commands organized by:

  • Install – Package managers and dependencies
  • Setup – Environment configuration
  • Build – Compilation and build steps
  • Run – Execution commands
  • Test – Testing frameworks
  • Lint – Code quality tools

All commands are categorized by OS (Linux/macOS/Windows/Generic) with source attribution.

🚀 Canonical Quickstart

Auto-generated step-by-step quickstart guide per OS, intelligently selecting the most relevant commands from documentation and CI workflows.

🔗 Additional Information

  • Overview from README
  • Python API usage examples
  • Helpful links (documentation, wikis, releases)
  • Notable files and directories
  • Imaging-specific signals (for medical/scientific imaging projects)

Output Format

Markdown Card

The generated Markdown file includes:

  • Repository metadata (license, topics, languages)
  • Overview extracted from README
  • Quick facts about languages and ecosystems
  • Capability facts (packages, entry points, OS support, etc.)
  • Canonical quickstart commands organized by OS
  • Python API examples (if found)
  • Helpful links with source attribution
  • Notable files and directories

JSON Card Structure

{
  "repo_url": "https://github.com/owner/repo",
  "ref": "main",
  "title": "owner/repo",
  "meta": {
    "license": "MIT",
    "topics": ["python", "data-science"],
    "languages": {"Python": 50000, "JavaScript": 10000}
  },
  "markdown": "...", // Full markdown content
  "extras": {
    "ecosystems": ["python", "node"],
    "capabilities": {
      "entrypoints": ["myapp = mypackage.cli:main"],
      "provides_api": true,
      "provides_cli": true,
      "dockerfile_present": true,
      "package_names": ["numpy", "pandas"],
      "os_support": ["linux", "macos"],
      "model_weight_links": ["https://huggingface.co/..."],
      "dataset_links": ["https://zenodo.org/..."],
      "buckets_by_os": {
        "install": {
          "linux": [{"cmd": "apt install...", "source": ".github/..."}],
          "macos": [...],
          "windows": [...],
          "generic": [...]
        },
        "build": {...},
        "run": {...},
        "test": {...},
        "lint": {...}
      }
    },
    "quickstart": {
      "linux": [{"cmd": "pip install .", "source": "README.md"}],
      "macos": [...],
      "windows": [...],
      "generic": [...]
    },
    "imaging": {
      "imaging_score": 0.80,
      "python_libs": ["pydicom", "nibabel"],
      "file_types": [".dcm", ".nii"],
      "tasks": ["segmentation", "registration"],
      "modalities": ["CT", "MRI"]
    }
  }
}

Note: All commands and links include provenance (source file path) for transparency.


Programmatic Usage

Simple API

RepoCards provides a simple API for programmatic access:

import repocards

# Get markdown string (default) - token auto-loaded from .env
markdown = repocards.get_repo_info("https://github.com/owner/repo")
print(markdown)

# Get JSON string
json_str = repocards.get_repo_info("https://github.com/owner/repo", mode="json")

# Get pydantic object for structured access
card = repocards.get_repo_info("https://github.com/owner/repo", mode="pydantic")
print(card.title)
print(card.meta["license"])
print(card.extras["ecosystems"])

# Write to markdown file
path = repocards.get_repo_info(
    "https://github.com/owner/repo",
    mode="markdown_file",
    out_dir="./output"
)
print(f"Wrote to: {path}")

# Write to JSON file
path = repocards.get_repo_info(
    "https://github.com/owner/repo",
    mode="json_file",
    out_dir="./output"
)
print(f"Wrote to: {path}")

# Control file fetching limit
card = repocards.get_repo_info(
    "https://github.com/owner/repo",
    mode="pydantic",
    max_files=100
)

Available Modes

Mode Returns Description
"markdown" str Markdown content (default)
"json" str JSON string
"pydantic" RepoCard Pydantic model object
"markdown_file" str Writes file, returns path
"json_file" str Writes file, returns path

GitHub Authentication

Authentication is automatic! Just create a .env file in your project root:

# .env file
GITHUB_TOKEN=ghp_your_token_here

The token is automatically loaded from .env or environment variables.

Rate Limits:

  • Without token: 60 requests/hour
  • With token: 5,000 requests/hour

Get a GitHub token:

  1. Go to https://github.com/settings/tokens
  2. Generate a new token (classic) with repo scope
  3. Add it to your .env file

Alternative: Export as environment variable

export GITHUB_TOKEN="ghp_your_token_here"

How It Works

Intelligent File Selection

RepoCards fetches a curated subset of repository files:

  • Documentation (README, docs/, etc.)
  • Package manifests (pyproject.toml, package.json, CMakeLists.txt, etc.)
  • CI workflows (.github/workflows/)
  • Example scripts and demos
  • Docker configurations

This selective approach keeps analysis fast while gathering comprehensive information.

Command Harvesting

Commands are extracted from:

  • Fenced shell blocks in Markdown (bash, sh, etc.)
  • Shell prompts ($-prefixed lines in documentation)
  • CI workflows (run: steps in GitHub Actions)

OS Classification

Commands are automatically classified by operating system:

  • Linux: apt, dnf, pacman package managers
  • macOS: brew, CMake OSX flags
  • Windows: choco, winget, msbuild, PowerShell
  • Generic: Cross-platform commands

Package Name Extraction

Intelligently parses installation commands to extract package names:

  • Filters out -r requirements.txt and similar flags
  • Removes URLs, local paths, and version specifiers
  • Strips extras (e.g., package[dev]package)

Python Code Detection

Extracts Python API examples from fenced code blocks:

  • Validates code contains real Python (imports/definitions/calls)
  • Limits to relevant, instructive snippets
  • Filters out empty or trivial examples

Domain-Specific Analysis

Imaging Analyzer (optional, gated by relevance):

  • Detects medical/scientific imaging projects
  • Identifies Python libraries (pydicom, nibabel, SimpleITK, etc.)
  • Recognizes file formats (.dcm, .nii, .mha, etc.)
  • Classifies tasks (segmentation, registration, etc.)
  • Identifies modalities (MRI, CT, PET, etc.)

Development Setup

Clone and install in editable mode:

git clone https://github.com/qchapp/repocards
cd repocards
pip install -e .

Run tests:

pytest tests/

Design Philosophy

General-Purpose

Works on any GitHub repository without per-project configuration or YAML rules.

Evidence-Based

Every extracted command and fact includes source file attribution. No invented or assumed information.

Agent-Ready

Structured JSON output with machine-readable facts enables:

  • Automated documentation systems
  • Developer tools and IDE integrations
  • AI agents that need to understand codebases
  • Repository analysis pipelines

Reliable

  • Verbatim commands from actual documentation
  • No hallucination or inference beyond what's in the repository
  • Clear provenance for all extracted information

Use Cases

  • 📚 Documentation Generation: Automatically create comprehensive repo cards
  • 🤖 AI/Agent Tools: Provide structured repository information to AI systems
  • 🔍 Code Discovery: Quickly understand unfamiliar projects
  • 📊 Repository Analysis: Batch analyze multiple repositories
  • 🛠️ Developer Tooling: Build IDE extensions or CLI tools that need repo metadata
  • 🏥 Domain Analysis: Identify imaging, ML, or other domain-specific projects

License

MIT


Contributing

Contributions welcome! Please feel free to submit issues or pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repocards-0.1.0.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repocards-0.1.0-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file repocards-0.1.0.tar.gz.

File metadata

  • Download URL: repocards-0.1.0.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for repocards-0.1.0.tar.gz
Algorithm Hash digest
SHA256 67ccf504f39d99345d0e898e1aed6a7f27505204c9d2bdbfa894b6d9c05dc659
MD5 59eb1122bdcbfb9719e042708a3e4750
BLAKE2b-256 f835015ec1ac654f3d40b84d859a9e3a67aa7a2a9f2e69be93c27c4167a65634

See more details on using hashes here.

File details

Details for the file repocards-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: repocards-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for repocards-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5758ffcac38111a8b01f49f3f27d2940200a006113e865f978fd66b9de3a7eef
MD5 663aed4516a7fda4b96fb6d13610ebe7
BLAKE2b-256 ffb10b3a7d6624ff5a543c1937c9d6ec4867a2bc132de294c37cef5888c87224

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page