Skip to main content

A command-line tool to semantically search your starred GitHub repositories

Project description

GHS: Semantic Search for GitHub Stars

A command-line tool to semantically search your starred GitHub repositories.

¿WHY? If you are like me, who goes starring repositories as a way to bookmark them, but you later find it hard to recall a specific tool or library due to the archaic search feature in GitHub, which does not do semantic similarity search, then this tool is for you.

Features

  • Unified command-line interface with intuitive subcommands
  • Fetches all starred repositories from your GitHub profile
  • Parallel processing with 5 concurrent workers for fast README fetching
  • Intelligent rate limit handling - automatically detects and waits for GitHub API limits to reset
  • Extracts and parses README files (supports .md, .txt, and plain README)
  • Generates embeddings using a lightweight sentence-transformer model (all-MiniLM-L6-v2)
  • Stores data efficiently using sqlite-vec for fast vector similarity search
  • Smart refresh command to sync added/removed stars
  • Semantic search to find repositories by meaning, not just keywords
  • Real-time progress feedback showing currently processing repositories

Installation

Option 1: Install from PyPI (Recommended)

# Install the package
pip install github-stars-search

# For CPU-only PyTorch (faster, no CUDA overhead):
pip install github-stars-search --extra-index-url https://download.pytorch.org/whl/cpu

After installation, the tool will be available as the ghs command.

Option 2: Install from Source

# Clone the repository
git clone https://github.com/yourusername/github-stars-organizer.git
cd github-stars-organizer

# Install in development mode
pip install -e .

# For CPU-only PyTorch:
pip install -e . --extra-index-url https://download.pytorch.org/whl/cpu

Setup

  1. Create a GitHub Personal Access Token:

  2. Configure environment:

cp .env.example .env
# Edit .env and add your GitHub token

Usage

The tool provides a unified CLI with four main commands:

Fetch - Initial Indexing

Fetch and index all your starred repositories:

ghs fetch

This will:

  1. Check your GitHub API rate limit status
  2. Fetch all your starred repositories from GitHub
  3. Download and parse their READMEs in parallel (5 concurrent workers)
  4. Generate embeddings using the all-MiniLM-L6-v2 model (384-dimensional)
  5. Store everything in a local SQLite database with vector search capabilities
  6. Skip repositories that are already stored

Rate Limiting: The tool automatically monitors GitHub API rate limits and will pause with a clear message if limits are reached, then resume when they reset.

Search - Semantic Search

Search your stars using natural language queries:

ghs search "your search query"

Examples:

ghs search "machine learning frameworks"
ghs search "web scraping tools"
ghs search "rust web server"
ghs search "react component libraries" --limit 5

Options:

  • -l, --limit N: Number of results to return (default: 10)

Refresh - Sync Changes

Synchronize your database with your current GitHub stars (adds new stars, removes unstarred repositories):

ghs refresh

This command:

  1. Fetches your current starred repositories
  2. Compares with the local database
  3. Adds newly starred repositories
  4. Removes repositories you've unstarred
  5. Shows a summary of changes

Stats - Database Statistics

Show database statistics:

ghs stats

Displays:

  • Total repositories indexed
  • Number of repositories with embeddings
  • Number of repositories with README files
  • README coverage percentage

Command Quick Reference

ghs fetch                      # Initial fetch and index
ghs search "query"             # Search repositories
ghs search "query" --limit 5   # Limit results
ghs refresh                    # Sync added/removed stars
ghs stats                      # Show statistics

How It Works

  1. GitHub API: Uses PyGithub to fetch your starred repositories with intelligent rate limit handling
  2. Parallel README Fetching: Downloads READMEs using 5 concurrent workers with shared rate limit detection
  3. README Extraction: Uses GitHub's dedicated README API endpoint for efficient fetching
  4. Embeddings: Uses sentence-transformers (all-MiniLM-L6-v2) to generate 384-dim vectors
  5. Vector Search: Stores embeddings in sqlite-vec for fast similarity search using cosine distance
  6. Smart Sync: Refresh command intelligently adds/removes repositories based on current stars
  7. Rate Limit Protection: Automatically detects rate limits, displays clear wait times, and resumes when ready

Database Schema

The tool creates a stars.db SQLite database with:

repositories table:

  • Repository metadata (id, name, description, URL, stars, language)
  • README content and type
  • Timestamps

vec_repositories table (virtual):

  • Vector embeddings for semantic search
  • Linked to repositories via repo_id

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

github_stars_search-0.1.1.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

github_stars_search-0.1.1-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file github_stars_search-0.1.1.tar.gz.

File metadata

  • Download URL: github_stars_search-0.1.1.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for github_stars_search-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9c9d2083f66865ae9cc107ec5849f2ea416311ac23ebc14648eddaed0224063c
MD5 e484a88d1c49f3e6a499c3a5b920340c
BLAKE2b-256 3c316b181a6728d0711f63ec35c30c466514dee806e8c75d66d2e40b7a0b229b

See more details on using hashes here.

Provenance

The following attestation bundles were made for github_stars_search-0.1.1.tar.gz:

Publisher: release.yml on webpolis/ghs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file github_stars_search-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for github_stars_search-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 788aec94e719c49c0de2e0958eae997919c6eef98def370ae253d7dfd3b12550
MD5 eadef87666cb5dfbace333a662d62905
BLAKE2b-256 639d706652979014c90d401e0b511b5d18f851e7b3dda14084c9e5844e54111b

See more details on using hashes here.

Provenance

The following attestation bundles were made for github_stars_search-0.1.1-py3-none-any.whl:

Publisher: release.yml on webpolis/ghs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page