Skip to main content

GitHub Repository Catalog: Fetching, indexing, and organizing READMEs.

Project description

RepoDex

Update GitHub README Collections

Your GitHub Repository Catalog: A comprehensive tool for fetching, indexing, and organizing READMEs from GitHub repositories.

Scripts

This repository contains several Python scripts for managing repository information:

README Fetching & Indexing:

  • gh_repo_fetch_index_shane.py: Fetches READMEs from your personal GitHub repositories (public and private, excluding forks) and generates an index file (github-project-readmes-shane/README.md) with repository metadata and statistics.
  • gh_repo_fetch_index_cello.py: Fetches READMEs from the CelloCommunications organization repositories authored by you (based on the first commit) and generates a similar index file (github-project-readmes-cello/README.md).

Release Management (for personal repos):

  • gh_repo_release_latest_shane.py: Reads github-project-readmes-shane/repositories.csv, finds the latest semantic version tag for each personal repository, and creates a GitHub Release with auto-generated notes if one doesn't already exist for that tag.
  • gh_repo_release_initial_shane.py: Reads github-project-readmes-shane/repositories.csv and creates an initial v0.1.0 tag and release for any personal repository that currently has no tags.
  • gh_repo_release_latest_cello.py: Fetches repositories authored by you in the CelloCommunications org, finds the latest semantic version tag for each, and creates a GitHub Release with auto-generated notes if one doesn't already exist. Includes a --dry-run option.
  • gh_repo_release_initial_cello.py: Fetches repositories authored by you in the CelloCommunications org and creates an initial v0.1.0 tag and release for any that currently have no tags. Includes a --dry-run option.
  • gh_repo_update_readmes.py: Reads repository names from a CSV file (e.g., github-project-readmes-shane/repositories.csv), finds corresponding local README files, and updates the remote READMEs using the GitHub API.
  • gh_repo_setup_secret.py: Sets up a GitHub Personal Access Token as a repository secret (e.g., GH_PAT for workflows).

Features

  • Fetches READMEs from GitHub repositories
  • Filters by authorship (first commit author)
  • Generates a Markdown index with links to all fetched READMEs
  • Includes creation and update dates for each repository

Performance Optimization

A significant optimization has been implemented using GitHub's GraphQL API to reduce API calls and improve efficiency. See optimization.md for details on:

  • The original vs. optimized approach
  • Technical implementation with GraphQL
  • Performance benefits
  • Implementation considerations

Development Approach

Terminal Testing First

IMPORTANT: Before implementing any scripting solution, ALWAYS test your approach directly in the terminal first. This principle was critical to discovering the GraphQL optimization in this project.

For example, before implementing the GraphQL solution in Python:

  1. Basic GraphQL query was tested directly with GitHub CLI:

    gh api graphql -f query='query { organization(login: "CelloCommunications") { ... } }'
    
  2. Once the query worked, it was refined interactively:

    gh api graphql -f query='...' | jq '.data.organization.repositories.nodes[] | select(...)'
    
  3. Only after confirming the approach worked in the terminal was it implemented in Python.

This terminal-first approach allows you to:

  • Verify API responses without writing complex code
  • Iterate quickly on query structure
  • Identify potential issues early
  • Understand exactly what data you're working with

Installation

You can install RepoDex directly from PyPI:

# Using pip
pip install repodex

# Using uv (recommended)
uv pip install repodex

Usage

As a Command-Line Tool

After installation, you can use the command-line tools directly:

# Fetch/Index personal READMEs
repodex-fetch-shane

# Fetch/Index organization READMEs authored by you
repodex-fetch-cello

# Create releases for latest tags on personal repos (if missing)
repodex-release-latest-shane

# Create initial v0.1.0 release for untagged personal repos
repodex-release-initial-shane

# Create latest releases for organization repos (Dry Run)
repodex-release-latest-cello --dry-run

# Create initial v0.1.0 release for untagged organization repos (Dry Run)
repodex-release-initial-cello --dry-run

# Update README files across multiple repositories using a CSV list
repodex-update-readmes -c path/to/your/repositories.csv -d path/to/local/readmes/

# Setup GitHub PAT as a repository secret
repodex-setup-secret

From Source

Clone this repository:

git clone https://github.com/shaneholloman/repodex.git
cd repodex

Run the desired script from the tools/ directory:

# Fetch/Index personal READMEs
python3 tools/gh_repo_fetch_index_shane.py

# Fetch/Index Cello READMEs authored by you
python3 tools/gh_repo_fetch_index_cello.py

# Create releases for latest tags on personal repos (if missing)
python3 tools/gh_repo_release_latest_shane.py

# Create initial v0.1.0 release for untagged personal repos
python3 tools/gh_repo_release_initial_shane.py

# Create latest releases for Cello repos (Dry Run)
python3 tools/gh_repo_release_latest_cello.py --dry-run

# Create initial v0.1.0 release for untagged Cello repos (Dry Run)
python3 tools/gh_repo_release_initial_cello.py --dry-run

# Update README files across multiple repositories using a CSV list
python3 tools/gh_repo_update_readmes.py -c path/to/your/repositories.csv -d path/to/local/readmes/

# Setup GitHub PAT as a repository secret
python3 tools/gh_repo_setup_secret.py

The fetcher scripts will:

  1. Fetch repositories from GitHub
  2. Filter by authorship (for the Cello script)
  3. Download READMEs from each repository
  4. Generate an index file with links to all fetched READMEs

Requirements

  • Python 3.10+ (required for | type hint syntax)
  • GitHub CLI (gh) installed and authenticated
  • packaging library for Python (pip install packaging) for release scripts

Output

The scripts will create subdirectories with fetched READMEs and an index file:

  • github-project-readmes-shane/: For personal repositories
  • github-project-readmes-cello/: For organization repositories

Each directory contains:

  • Individual README files renamed according to repository
  • An index.md file with links to all READMEs and metadata

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repodex-0.1.0.tar.gz (201.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repodex-0.1.0-py3-none-any.whl (38.5 kB view details)

Uploaded Python 3

File details

Details for the file repodex-0.1.0.tar.gz.

File metadata

  • Download URL: repodex-0.1.0.tar.gz
  • Upload date:
  • Size: 201.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for repodex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 19501c2328ba5ca7325b49e298a91c689873b0d58cd574b9a15ae487d0b74597
MD5 146469c6a9bd90d89ed535f2e6b3d2ca
BLAKE2b-256 7b70b5af6594646f709ace2d88051908f8f7aebdbcc8695237472457c513cfc8

See more details on using hashes here.

File details

Details for the file repodex-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: repodex-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for repodex-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 316f3d45e2c9c5f7975c5cefb95b29a5fced6de852efddd3d712e8ab38208b98
MD5 826dd111d6bd1b55bc11f8df22bd3d56
BLAKE2b-256 96582f457eb99e4b4eb6d2aa846074ef70202797ac9266956f6345bdcd28dc45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page