Discover, rank, and get personalized recommendations for 'good first issue' contributions
Project description
🎯 Good First Issue Finder
Discover, rank, and get personalized recommendations for "good first issue" contributions in Canonical's GitHub repositories.
Features
- Scrape all open "good first issues" across 76+ Canonical repos (206 issues)
- Rank using a weighted heuristic (freshness, competition, availability, popularity, activity, linked PRs)
- Match issues to your developer profile using GPT-5.5 (single API call, ~$0.002)
- Cache & diff between runs — see what's new since last time
- Auto-refresh with
--watchmode or cron scheduling - Beautiful TUI — browse, filter, and match interactively in the terminal
Quick Start
Prerequisites
- Python 3.12+
ghCLI authenticated (gh auth login)- OpenAI API key (for LLM matching only)
Install
pip install .
# Or for development (editable install with test dependencies)
pip install -e ".[dev]"
Run
# 1. Scrape and rank all issues
gfi-scrape
# 2. Interactive TUI (browse, filter, match)
export OPENAI_API_KEY='sk-...'
gfi-tui
# 3. Or use the headless matcher
gfi-match
Project Structure
├── pyproject.toml
├── README.md
├── docs/
│ ├── architecture.md # System design & data flow
│ └── scoring.md # Ranking heuristic explained
├── src/gfi_scraper/
│ ├── __init__.py
│ ├── scrape_good_first_issues.py # Scraper + ranker + cache
│ ├── match_issues.py # LLM-powered matcher
│ └── tui.py # Interactive terminal UI
├── tests/
│ └── test_all.py # 96 unit tests
├── .cache/ # Run-to-run diff cache (gitignored)
└── good_first_issues.csv # Latest scraped results
Usage
Scraper
# Basic run
gfi-scrape
# Custom org
gfi-scrape --org ubuntu
# Auto-refresh every 4 hours
gfi-scrape --watch --interval 4
# Generate crontab entry
gfi-scrape --cron
TUI
gfi-tui
| Key | Action |
|---|---|
b |
Browse all issues (paginated) |
n |
What's new (since last run) |
f |
Filter by keyword |
d |
Detail view of a specific issue |
m |
Match to your profile (LLM) |
s |
Stats overview |
q |
Quit |
Matcher (headless)
gfi-match --top 15
How Scoring Works
Each issue is scored 0–100 using a weighted composite:
| Signal | Weight | Logic |
|---|---|---|
| Freshness | 25% | Exponential decay (half-life: 180 days) |
| Competition | 25% | Fewer comments = higher score (cap: 10) |
| Availability | 20% | No assignees = 100, decays per assignee |
| Popularity | 15% | Repo stars, log-scaled |
| Activity | 10% | Staleness gate (updated within 1 year?) |
| PR Status | 5% | Open PR = competition penalty |
Testing
python3 -m pytest tests/ -v
96 tests covering: scoring functions, body extraction, CSV round-trips, caching/diffing, GraphQL parsing, LLM prompt building, TUI helpers, integration, and edge cases.
Cost
- Scraping: Free (uses
ghCLI with your GitHub token) - LLM matching: ~$0.002 per run (single GPT-5.5 call, ~10k tokens)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gfi_scraper-0.1.0.tar.gz.
File metadata
- Download URL: gfi_scraper-0.1.0.tar.gz
- Upload date:
- Size: 30.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1051a6f9bd6d1464b2c6d35d4ac766e77ae20c4474e47642f270aae0b396022
|
|
| MD5 |
450a1b7c1858f638dba35c9f2adaeb05
|
|
| BLAKE2b-256 |
2335410861470c1559990add55797af6df5093b152110507f12ed19c29a12cdd
|
Provenance
The following attestation bundles were made for gfi_scraper-0.1.0.tar.gz:
Publisher:
publish.yml on iamsharduld/gfi-scraper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gfi_scraper-0.1.0.tar.gz -
Subject digest:
f1051a6f9bd6d1464b2c6d35d4ac766e77ae20c4474e47642f270aae0b396022 - Sigstore transparency entry: 1592063011
- Sigstore integration time:
-
Permalink:
iamsharduld/gfi-scraper@f28ebaea73862b1f57ca34e0479b8f8a5912f1d3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/iamsharduld
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f28ebaea73862b1f57ca34e0479b8f8a5912f1d3 -
Trigger Event:
release
-
Statement type:
File details
Details for the file gfi_scraper-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gfi_scraper-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7837c0abeea6d4817a0d7cacb225bbfdf7916da26ad0ae7fc4b1b1811e653b8a
|
|
| MD5 |
b78958f71cd34437f9fe326b0c2a3926
|
|
| BLAKE2b-256 |
8f8d97cbc473513dedf222435315e3e41155340c921f74fa96fd5778b239652c
|
Provenance
The following attestation bundles were made for gfi_scraper-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on iamsharduld/gfi-scraper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gfi_scraper-0.1.0-py3-none-any.whl -
Subject digest:
7837c0abeea6d4817a0d7cacb225bbfdf7916da26ad0ae7fc4b1b1811e653b8a - Sigstore transparency entry: 1592063026
- Sigstore integration time:
-
Permalink:
iamsharduld/gfi-scraper@f28ebaea73862b1f57ca34e0479b8f8a5912f1d3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/iamsharduld
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f28ebaea73862b1f57ca34e0479b8f8a5912f1d3 -
Trigger Event:
release
-
Statement type: