GitHub Repository Catalog: Fetching, indexing, and organizing READMEs.
Project description
RepoDex
[!CAUTION] This project is currently for personal use only. A public-facing version with improved documentation, examples, and user-friendly interfaces is planned for future development. See the Package Transformation Plan for details on how this will evolve.
For now, Just don't use it!
Your GitHub Repository Catalog: A comprehensive tool for fetching, indexing, and organizing READMEs from GitHub repositories.
Scripts
This repository contains several Python scripts for managing repository information:
README Fetching & Indexing:
gh_repo_fetch_index_shane.py: Fetches READMEs from your personal GitHub repositories (public and private, excluding forks) and generates an index file (github-project-readmes-shane/README.md) with repository metadata and statistics.gh_repo_fetch_index_cello.py: Fetches READMEs from theCelloCommunicationsorganization repositories authored by you (based on the first commit) and generates a similar index file (github-project-readmes-cello/README.md).
Release Management (for personal repos):
gh_repo_release_latest_shane.py: Readsgithub-project-readmes-shane/repositories.csv, finds the latest semantic version tag for each personal repository, and creates a GitHub Release with auto-generated notes if one doesn't already exist for that tag.gh_repo_release_initial_shane.py: Readsgithub-project-readmes-shane/repositories.csvand creates an initialv0.1.0tag and release for any personal repository that currently has no tags.gh_repo_release_latest_cello.py: Fetches repositories authored by you in theCelloCommunicationsorg, finds the latest semantic version tag for each, and creates a GitHub Release with auto-generated notes if one doesn't already exist. Includes a--dry-runoption.gh_repo_release_initial_cello.py: Fetches repositories authored by you in theCelloCommunicationsorg and creates an initialv0.1.0tag and release for any that currently have no tags. Includes a--dry-runoption.gh_repo_update_readmes.py: Reads repository names from a CSV file (e.g.,github-project-readmes-shane/repositories.csv), finds corresponding local README files, and updates the remote READMEs using the GitHub API.gh_repo_setup_secret.py: Sets up a GitHub Personal Access Token as a repository secret (e.g.,GH_PATfor workflows).
Features
- Fetches READMEs from GitHub repositories
- Filters by authorship (first commit author)
- Generates a Markdown index with links to all fetched READMEs
- Includes creation and update dates for each repository
Performance Optimization
A significant optimization has been implemented using GitHub's GraphQL API to reduce API calls and improve efficiency. See optimization.md for details on:
- The original vs. optimized approach
- Technical implementation with GraphQL
- Performance benefits
- Implementation considerations
Development Approach
Terminal Testing First
IMPORTANT: Before implementing any scripting solution, ALWAYS test your approach directly in the terminal first. This principle was critical to discovering the GraphQL optimization in this project.
For example, before implementing the GraphQL solution in Python:
-
Basic GraphQL query was tested directly with GitHub CLI:
gh api graphql -f query='query { organization(login: "CelloCommunications") { ... } }'
-
Once the query worked, it was refined interactively:
gh api graphql -f query='...' | jq '.data.organization.repositories.nodes[] | select(...)'
-
Only after confirming the approach worked in the terminal was it implemented in Python.
This terminal-first approach allows you to:
- Verify API responses without writing complex code
- Iterate quickly on query structure
- Identify potential issues early
- Understand exactly what data you're working with
Installation
You can install RepoDex directly from PyPI:
# Using UV (the only supported method)
uv tool install repodex
IMPORTANT: This project strictly follows a UV-native workflow. The use of pip is strictly prohibited. See UV Workflow for details.
Usage
As a Command-Line Tool
After installation, you can use the command-line tools directly:
# Fetch/Index personal READMEs
repodex-fetch-shane
# Fetch/Index organization READMEs authored by you
repodex-fetch-cello
# Create releases for latest tags on personal repos (if missing)
repodex-release-latest-shane
# Create initial v0.1.0 release for untagged personal repos
repodex-release-initial-shane
# Create latest releases for organization repos (Dry Run)
repodex-release-latest-cello --dry-run
# Create initial v0.1.0 release for untagged organization repos (Dry Run)
repodex-release-initial-cello --dry-run
# Update README files across multiple repositories using a CSV list
repodex-update-readmes -c path/to/your/repositories.csv -d path/to/local/readmes/
# Setup GitHub PAT as a repository secret
repodex-setup-secret
From Source
Clone this repository:
git clone https://github.com/shaneholloman/repodex.git
cd repodex
Run the desired script from the tools/ directory:
# Fetch/Index personal READMEs
python3 tools/gh_repo_fetch_index_shane.py
# Fetch/Index Cello READMEs authored by you
python3 tools/gh_repo_fetch_index_cello.py
# Create releases for latest tags on personal repos (if missing)
python3 tools/gh_repo_release_latest_shane.py
# Create initial v0.1.0 release for untagged personal repos
python3 tools/gh_repo_release_initial_shane.py
# Create latest releases for Cello repos (Dry Run)
python3 tools/gh_repo_release_latest_cello.py --dry-run
# Create initial v0.1.0 release for untagged Cello repos (Dry Run)
python3 tools/gh_repo_release_initial_cello.py --dry-run
# Update README files across multiple repositories using a CSV list
python3 tools/gh_repo_update_readmes.py -c path/to/your/repositories.csv -d path/to/local/readmes/
# Setup GitHub PAT as a repository secret
python3 tools/gh_repo_setup_secret.py
The fetcher scripts will:
- Fetch repositories from GitHub
- Filter by authorship (for the Cello script)
- Download READMEs from each repository
- Generate an index file with links to all fetched READMEs
Requirements
- Python 3.10+ (required for
|type hint syntax) - GitHub CLI (gh) installed and authenticated
packaginglibrary for Python (install withuv install packaging) for release scripts
Output
The scripts will create subdirectories with fetched READMEs and an index file:
github-project-readmes-shane/: For personal repositoriesgithub-project-readmes-cello/: For organization repositories
Each directory contains:
- Individual README files renamed according to repository
- An
index.mdfile with links to all READMEs and metadata
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repodex-0.1.1.tar.gz.
File metadata
- Download URL: repodex-0.1.1.tar.gz
- Upload date:
- Size: 832.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf8a0ef8e6ca823ef645ff98657bc27dcdc54bb7adbe366c04cc4ef0378d85ac
|
|
| MD5 |
55f7b45aba83e3a5091e3d68ea2376ee
|
|
| BLAKE2b-256 |
e8229bd47624e0a156c3468ce7d1b5e8bb8ba1457d872aa29467263dfc594874
|
File details
Details for the file repodex-0.1.1-py3-none-any.whl.
File metadata
- Download URL: repodex-0.1.1-py3-none-any.whl
- Upload date:
- Size: 38.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c29443a16e6e89d2755caeba0751254d05a91afc636e08d5c61053e821b34214
|
|
| MD5 |
66b287e911c9b659b615c46cc47b1a10
|
|
| BLAKE2b-256 |
5e5b739cfe7b9f8582fcc3a5147458ae533d298ed61de07f774bf3bb342e998d
|