A command-line tool to semantically search your starred GitHub repositories
Project description
GHS: Semantic Search for GitHub Stars
A command-line tool to semantically search your starred GitHub repositories.
¿WHY? If you are like me, who goes starring repositories as a way to bookmark them, but you later find it hard to recall a specific tool or library due to the archaic search feature in GitHub, which does not do semantic similarity search, then this tool is for you.
Features
- Unified command-line interface with intuitive subcommands
- Fetches all starred repositories from your GitHub profile
- Parallel processing with 5 concurrent workers for fast README fetching
- Intelligent rate limit handling - automatically detects and waits for GitHub API limits to reset
- Extracts and parses README files (supports .md, .txt, and plain README)
- Generates embeddings using a lightweight sentence-transformer model (all-MiniLM-L6-v2)
- Stores data efficiently using sqlite-vec for fast vector similarity search
- Smart refresh command to sync added/removed stars
- Semantic search to find repositories by meaning, not just keywords
- Real-time progress feedback showing currently processing repositories
Installation
Option 1: Install from PyPI (Recommended)
# Install the package
pip install github-stars-search
# For CPU-only PyTorch (faster, no CUDA overhead):
pip install github-stars-search --extra-index-url https://download.pytorch.org/whl/cpu
After installation, the tool will be available as the ghs command.
Option 2: Install from Source
# Clone the repository
git clone https://github.com/yourusername/github-stars-organizer.git
cd github-stars-organizer
# Install in development mode
pip install -e .
# For CPU-only PyTorch:
pip install -e . --extra-index-url https://download.pytorch.org/whl/cpu
Setup
-
Create a GitHub Personal Access Token:
- Go to https://github.com/settings/tokens
- Create a new token with
public_reposcope - Copy the token
-
Configure environment:
cp .env.example .env
# Edit .env and add your GitHub token
Usage
The tool provides a unified CLI with four main commands:
Fetch - Initial Indexing
Fetch and index all your starred repositories:
ghs fetch
This will:
- Check your GitHub API rate limit status
- Fetch all your starred repositories from GitHub
- Download and parse their READMEs in parallel (5 concurrent workers)
- Generate embeddings using the all-MiniLM-L6-v2 model (384-dimensional)
- Store everything in a local SQLite database with vector search capabilities
- Skip repositories that are already stored
Rate Limiting: The tool automatically monitors GitHub API rate limits and will pause with a clear message if limits are reached, then resume when they reset.
Search - Semantic Search
Search your stars using natural language queries:
ghs search "your search query"
Examples:
ghs search "machine learning frameworks"
ghs search "web scraping tools"
ghs search "rust web server"
ghs search "react component libraries" --limit 5
Options:
-l, --limit N: Number of results to return (default: 10)
Refresh - Sync Changes
Synchronize your database with your current GitHub stars (adds new stars, removes unstarred repositories):
ghs refresh
This command:
- Fetches your current starred repositories
- Compares with the local database
- Adds newly starred repositories
- Removes repositories you've unstarred
- Shows a summary of changes
Stats - Database Statistics
Show database statistics:
ghs stats
Displays:
- Total repositories indexed
- Number of repositories with embeddings
- Number of repositories with README files
- README coverage percentage
Command Quick Reference
ghs fetch # Initial fetch and index
ghs search "query" # Search repositories
ghs search "query" --limit 5 # Limit results
ghs refresh # Sync added/removed stars
ghs stats # Show statistics
How It Works
- GitHub API: Uses PyGithub to fetch your starred repositories with intelligent rate limit handling
- Parallel README Fetching: Downloads READMEs using 5 concurrent workers with shared rate limit detection
- README Extraction: Uses GitHub's dedicated README API endpoint for efficient fetching
- Embeddings: Uses sentence-transformers (all-MiniLM-L6-v2) to generate 384-dim vectors
- Vector Search: Stores embeddings in sqlite-vec for fast similarity search using cosine distance
- Smart Sync: Refresh command intelligently adds/removes repositories based on current stars
- Rate Limit Protection: Automatically detects rate limits, displays clear wait times, and resumes when ready
Database Schema
The tool creates a stars.db SQLite database with:
repositories table:
- Repository metadata (id, name, description, URL, stars, language)
- README content and type
- Timestamps
vec_repositories table (virtual):
- Vector embeddings for semantic search
- Linked to repositories via repo_id
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file github_stars_search-0.1.1.tar.gz.
File metadata
- Download URL: github_stars_search-0.1.1.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9c9d2083f66865ae9cc107ec5849f2ea416311ac23ebc14648eddaed0224063c
|
|
| MD5 |
e484a88d1c49f3e6a499c3a5b920340c
|
|
| BLAKE2b-256 |
3c316b181a6728d0711f63ec35c30c466514dee806e8c75d66d2e40b7a0b229b
|
Provenance
The following attestation bundles were made for github_stars_search-0.1.1.tar.gz:
Publisher:
release.yml on webpolis/ghs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
github_stars_search-0.1.1.tar.gz -
Subject digest:
9c9d2083f66865ae9cc107ec5849f2ea416311ac23ebc14648eddaed0224063c - Sigstore transparency entry: 659443086
- Sigstore integration time:
-
Permalink:
webpolis/ghs@29b1d85a1d1d257a36ce2ca3081fc0ec2fbcaeaf -
Branch / Tag:
refs/heads/main - Owner: https://github.com/webpolis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@29b1d85a1d1d257a36ce2ca3081fc0ec2fbcaeaf -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file github_stars_search-0.1.1-py3-none-any.whl.
File metadata
- Download URL: github_stars_search-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
788aec94e719c49c0de2e0958eae997919c6eef98def370ae253d7dfd3b12550
|
|
| MD5 |
eadef87666cb5dfbace333a662d62905
|
|
| BLAKE2b-256 |
639d706652979014c90d401e0b511b5d18f851e7b3dda14084c9e5844e54111b
|
Provenance
The following attestation bundles were made for github_stars_search-0.1.1-py3-none-any.whl:
Publisher:
release.yml on webpolis/ghs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
github_stars_search-0.1.1-py3-none-any.whl -
Subject digest:
788aec94e719c49c0de2e0958eae997919c6eef98def370ae253d7dfd3b12550 - Sigstore transparency entry: 659443099
- Sigstore integration time:
-
Permalink:
webpolis/ghs@29b1d85a1d1d257a36ce2ca3081fc0ec2fbcaeaf -
Branch / Tag:
refs/heads/main - Owner: https://github.com/webpolis
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@29b1d85a1d1d257a36ce2ca3081fc0ec2fbcaeaf -
Trigger Event:
workflow_dispatch
-
Statement type: