Skip to main content

GitHub Repository Catalog: Fetching, indexing, and organizing READMEs.

Project description

RepoDex

[!CAUTION] This project is currently for personal use only. A public-facing version with improved documentation, examples, and user-friendly interfaces is planned for future development. See the Package Transformation Plan for details on how this will evolve.

For now, Just don't use it!

Update GitHub README Collections

Your GitHub Repository Catalog: A comprehensive tool for fetching, indexing, and organizing READMEs from GitHub repositories.

Scripts

This repository contains several Python scripts for managing repository information:

README Fetching & Indexing:

  • gh_repo_fetch_index_shane.py: Fetches READMEs from your personal GitHub repositories (public and private, excluding forks) and generates an index file (github-project-readmes-shane/README.md) with repository metadata and statistics.
  • gh_repo_fetch_index_cello.py: Fetches READMEs from the CelloCommunications organization repositories authored by you (based on the first commit) and generates a similar index file (github-project-readmes-cello/README.md).

Release Management (for personal repos):

  • gh_repo_release_latest_shane.py: Reads github-project-readmes-shane/repositories.csv, finds the latest semantic version tag for each personal repository, and creates a GitHub Release with auto-generated notes if one doesn't already exist for that tag.
  • gh_repo_release_initial_shane.py: Reads github-project-readmes-shane/repositories.csv and creates an initial v0.1.0 tag and release for any personal repository that currently has no tags.
  • gh_repo_release_latest_cello.py: Fetches repositories authored by you in the CelloCommunications org, finds the latest semantic version tag for each, and creates a GitHub Release with auto-generated notes if one doesn't already exist. Includes a --dry-run option.
  • gh_repo_release_initial_cello.py: Fetches repositories authored by you in the CelloCommunications org and creates an initial v0.1.0 tag and release for any that currently have no tags. Includes a --dry-run option.
  • gh_repo_update_readmes.py: Reads repository names from a CSV file (e.g., github-project-readmes-shane/repositories.csv), finds corresponding local README files, and updates the remote READMEs using the GitHub API.
  • gh_repo_setup_secret.py: Sets up a GitHub Personal Access Token as a repository secret (e.g., GH_PAT for workflows).

Features

  • Fetches READMEs from GitHub repositories
  • Filters by authorship (first commit author)
  • Generates a Markdown index with links to all fetched READMEs
  • Includes creation and update dates for each repository

Performance Optimization

A significant optimization has been implemented using GitHub's GraphQL API to reduce API calls and improve efficiency. See optimization.md for details on:

  • The original vs. optimized approach
  • Technical implementation with GraphQL
  • Performance benefits
  • Implementation considerations

Development Approach

Terminal Testing First

IMPORTANT: Before implementing any scripting solution, ALWAYS test your approach directly in the terminal first. This principle was critical to discovering the GraphQL optimization in this project.

For example, before implementing the GraphQL solution in Python:

  1. Basic GraphQL query was tested directly with GitHub CLI:

    gh api graphql -f query='query { organization(login: "CelloCommunications") { ... } }'
    
  2. Once the query worked, it was refined interactively:

    gh api graphql -f query='...' | jq '.data.organization.repositories.nodes[] | select(...)'
    
  3. Only after confirming the approach worked in the terminal was it implemented in Python.

This terminal-first approach allows you to:

  • Verify API responses without writing complex code
  • Iterate quickly on query structure
  • Identify potential issues early
  • Understand exactly what data you're working with

Installation

You can install RepoDex directly from PyPI:

# Using UV (the only supported method)
uv tool install repodex

IMPORTANT: This project strictly follows a UV-native workflow. The use of pip is strictly prohibited. See UV Workflow for details.

Usage

As a Command-Line Tool

After installation, you can use the command-line tools directly:

# Fetch/Index personal READMEs
repodex-fetch-shane

# Fetch/Index organization READMEs authored by you
repodex-fetch-cello

# Create releases for latest tags on personal repos (if missing)
repodex-release-latest-shane

# Create initial v0.1.0 release for untagged personal repos
repodex-release-initial-shane

# Create latest releases for organization repos (Dry Run)
repodex-release-latest-cello --dry-run

# Create initial v0.1.0 release for untagged organization repos (Dry Run)
repodex-release-initial-cello --dry-run

# Update README files across multiple repositories using a CSV list
repodex-update-readmes -c path/to/your/repositories.csv -d path/to/local/readmes/

# Setup GitHub PAT as a repository secret
repodex-setup-secret

From Source

Clone this repository:

git clone https://github.com/shaneholloman/repodex.git
cd repodex

Run the desired script from the tools/ directory:

# Fetch/Index personal READMEs
python3 tools/gh_repo_fetch_index_shane.py

# Fetch/Index Cello READMEs authored by you
python3 tools/gh_repo_fetch_index_cello.py

# Create releases for latest tags on personal repos (if missing)
python3 tools/gh_repo_release_latest_shane.py

# Create initial v0.1.0 release for untagged personal repos
python3 tools/gh_repo_release_initial_shane.py

# Create latest releases for Cello repos (Dry Run)
python3 tools/gh_repo_release_latest_cello.py --dry-run

# Create initial v0.1.0 release for untagged Cello repos (Dry Run)
python3 tools/gh_repo_release_initial_cello.py --dry-run

# Update README files across multiple repositories using a CSV list
python3 tools/gh_repo_update_readmes.py -c path/to/your/repositories.csv -d path/to/local/readmes/

# Setup GitHub PAT as a repository secret
python3 tools/gh_repo_setup_secret.py

The fetcher scripts will:

  1. Fetch repositories from GitHub
  2. Filter by authorship (for the Cello script)
  3. Download READMEs from each repository
  4. Generate an index file with links to all fetched READMEs

Requirements

  • Python 3.10+ (required for | type hint syntax)
  • GitHub CLI (gh) installed and authenticated
  • packaging library for Python (install with uv install packaging) for release scripts

Output

The scripts will create subdirectories with fetched READMEs and an index file:

  • github-project-readmes-shane/: For personal repositories
  • github-project-readmes-cello/: For organization repositories

Each directory contains:

  • Individual README files renamed according to repository
  • An index.md file with links to all READMEs and metadata

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repodex-0.1.1.tar.gz (832.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repodex-0.1.1-py3-none-any.whl (38.9 kB view details)

Uploaded Python 3

File details

Details for the file repodex-0.1.1.tar.gz.

File metadata

  • Download URL: repodex-0.1.1.tar.gz
  • Upload date:
  • Size: 832.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for repodex-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cf8a0ef8e6ca823ef645ff98657bc27dcdc54bb7adbe366c04cc4ef0378d85ac
MD5 55f7b45aba83e3a5091e3d68ea2376ee
BLAKE2b-256 e8229bd47624e0a156c3468ce7d1b5e8bb8ba1457d872aa29467263dfc594874

See more details on using hashes here.

File details

Details for the file repodex-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: repodex-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 38.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.10

File hashes

Hashes for repodex-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c29443a16e6e89d2755caeba0751254d05a91afc636e08d5c61053e821b34214
MD5 66b287e911c9b659b615c46cc47b1a10
BLAKE2b-256 5e5b739cfe7b9f8582fcc3a5147458ae533d298ed61de07f774bf3bb342e998d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page