A CLI tool that helps first-time open source contributors analyze GitHub issues against local repositories.
Project description
OSS Issue Analyzer
A CLI tool that helps first-time open source contributors analyze GitHub issues against their local cloned repositories. It indexes code, estimates difficulty, and helps contributors pick issues they can realistically solve.
Features
- Local Code Indexing - Parse and index Python, JavaScript, and TypeScript code
- GitHub Issue Integration - Fetch issues directly from GitHub
- Difficulty Estimation - Heuristic-based scoring for issue complexity
- Hybrid Retrieval - Semantic + keyword search against indexed code
- Contributing Signals - Identifies test files, documentation, and isolated changes
Installation
pip install oss-issue-analyzer
Or install in development mode:
pip install -e .
Usage
1. Index a Repository
cd /path/to/repo
oss-issue-analyzer index .
This creates a .oss-index/ folder in the repository root containing vector embeddings.
2. Analyze an Issue
# Using issue number (run from the cloned repo directory)
oss-issue-analyzer analyze 123
# Using a GitHub URL
oss-issue-analyzer analyze https://github.com/owner/repo/issues/123
The tool automatically detects the GitHub remote from the local git repository.
3. Use Local Issue File
oss-issue-analyzer analyze ./issue.md
Commands
index
Index a local repository for code analysis.
oss-issue-analyzer index <repo_path> [OPTIONS]
Options:
--embedder Embedding model (nomic, minilm) [default: minilm]
--force Force re-index from scratch
analyze
Analyze a GitHub issue against the indexed codebase.
oss-issue-analyzer analyze <issue_ref> [OPTIONS]
Arguments:
issue_ref Issue number, URL, or path to local markdown file
Options:
--repo Path to indexed repository
--db-path Path to index database
--embedder Embedding model [default: minilm]
--limit Number of code units to retrieve [default: 10]
--gh-repo GitHub repo (owner/repo) - auto-detected if not provided
Output Example
╭─────────────── Issue: Fix tokenizer performance ───────────────╮
│ Difficulty: EASY (conf: 88%) │
│ Relative: Easier than 75% │
│ │
│ Files involved: │
│ → src/tokenizer.py │
│ → tests/test_tokenizer.py │
│ │
│ Suggested approach: │
│ 1. Start in src/tokenizer.py -> Tokenizer.encode │
│ 2. Bug is in the batch processing logic │
│ 3. Test: pytest tests/test_tokenizer.py │
│ │
│ Contributor signals: │
│ > Test file exists - changes are verifiable │
│ > Has documentation │
│ > Isolated change possible │
└────────────────────────────────────────────────────────────────╯
Configuration
Environment Variables
GITHUB_TOKEN- GitHub personal access token for API rate limitsHF_TOKEN- Hugging Face token for faster embedding downloads
Data Storage
Index data is stored in .oss-index/ folder in the repository root:
index.lance/code_units.lance- Vector embeddingsindex.lance/repositories.lance- Repository metadata
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oss_issue_analyzer-1.0.0.tar.gz.
File metadata
- Download URL: oss_issue_analyzer-1.0.0.tar.gz
- Upload date:
- Size: 132.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2f5f74a7da3a6385f43ea90d523cc7d2322bca736f5dd01b8a8eaaf57c59a77
|
|
| MD5 |
bf2e2620a112a7e7857d8969e5ef7988
|
|
| BLAKE2b-256 |
d204f3df9cd53e8ca757724cad09e7e91b75f3d5b4a50d511d2681fbf9374560
|
File details
Details for the file oss_issue_analyzer-1.0.0-py3-none-any.whl.
File metadata
- Download URL: oss_issue_analyzer-1.0.0-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
600647565c04ac9683788e071101772c80a29e75080b28c051e7f47d90e32f7e
|
|
| MD5 |
368190793cb0b6b058789fde459e3c5d
|
|
| BLAKE2b-256 |
c9651fb4c02bbf9387639b6fc4410f2a6b24e8c44740112b6ea7092744939a20
|