Skip to main content

A CLI tool that helps first-time open source contributors analyze issues from GitHub, GitLab, and Bitbucket against local repositories.

Project description

OSS Issue Analyzer

A CLI tool that helps first-time open source contributors analyze issues from GitHub, GitLab, and Bitbucket against their local cloned repositories. It indexes code plus selected project text assets, estimates difficulty using AI or heuristics, and helps contributors pick issues they can realistically solve.

Features

  • Multi-Platform Support - Works with GitHub, GitLab, and Bitbucket repositories
  • Mixed Repository Indexing - Parse code and index selected config, workflow, and documentation files
  • Expanded Language Support - Index Python, JavaScript, TypeScript, Go, Rust, Java, C, and C++
  • Issue Integration - Fetch issues directly from GitHub, GitLab, or Bitbucket
  • Bulk Issue Scanning - Quick heuristic scoring (~80% accurate) for ALL issues using parallel processing
  • AI-Powered Scoring - Supports multiple LLM providers (OpenAI, Anthropic, Google, Azure OpenAI) for intelligent difficulty estimation and suggestions
  • Heuristic Fallback - Rule-based scoring when AI is unavailable
  • Hybrid Retrieval - Semantic + keyword search against indexed code
  • Contributing Signals - Identifies test files, documentation, and isolated changes
  • Dependency-Aware Scoring - Parses core dependency manifests and flags dependency-hell risk factors
  • Issue Comments Context - Includes issue comments (prioritized by maintainer input and popularity) to understand expected practices
  • Smart Caching - Minimizes API calls and costs (98% reduction in AI costs)

Installation

pip install oss-issue-analyzer

Or install in development mode:

pip install -e .

Quick Start

# 1. Index your repository
cd /path/to/repo
oss-issue-analyzer index .

# 2. (Optional) Set up AI provider for smarter analysis
oss-issue-analyzer setup

# 3. Bulk scan issues (FREE - uses quick heuristics)
oss-issue-analyzer list-issues

# 4. Deep analyze selected issue (1 AI call only)
oss-issue-analyzer analyze 123

Usage

1. Index a Repository

cd /path/to/repo
oss-issue-analyzer index .

This creates a .oss-index/ folder in the repository root containing vector embeddings for code and selected project text assets.

Supported code languages: Python, JavaScript, TypeScript, Go, Rust, Java, C, and C++.

When using mixed indexing, the tool also indexes dependency and build manifests such as pyproject.toml, requirements.txt, package.json, Cargo.toml, go.mod, pom.xml, Gradle files, CMakeLists.txt, Conan manifests, and vcpkg.json.

Options:

oss-issue-analyzer index <repo_path> [OPTIONS]

Options:
  --embedder    Embedding model (nomic, minilm) [default: minilm]
  --index-mode  Index mode (mixed, code-only) [default: mixed]
  --force        Force re-index from scratch

2. Set Up AI Provider (Optional but Recommended)

Configure an AI provider to get smarter difficulty analysis and suggestions:

# List available providers based on your .env
oss-issue-analyzer setup --list

# Interactive setup
oss-issue-analyzer setup

# Direct setup with provider and API key
oss-issue-analyzer setup --provider openai --api-key sk-... --test

# Clear saved configuration
oss-issue-analyzer setup --clear

Supported Providers:

Provider Environment Variable Default Model
OpenAI OPENAI_API_KEY gpt-4o-mini
Anthropic (Claude) ANTHROPIC_API_KEY claude-3-haiku-20240307
Google (Gemini) GOOGLE_API_KEY gemini-1.5-flash
Azure OpenAI AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT (deployment name)

3. List and Analyze Issues (Bulk Scan)

Scan ALL open issues with quick heuristic scoring (FREE, ~80% accurate), then deep-analyze only the ones you're interested in:

# Bulk scan (uses quick heuristics, NO AI calls)
oss-issue-analyzer list-issues

# Filter and sort
oss-issue-analyzer list-issues --filter-difficulty easy
oss-issue-analyzer list-issues --sort difficulty
oss-issue-analyzer list-issues --filter-label "good first issue"

# Interactive mode (select and analyze immediately)
oss-issue-analyzer list-issues --interactive

# Deep analysis (1 AI call for selected issue)
oss-issue-analyzer analyze 123

# Specify platform explicitly
oss-issue-analyzer list-issues --platform gitlab
oss-issue-analyzer analyze 123 --platform bitbucket

Cost Comparison:

Approach Platform API Calls AI API Calls Cost
Analyze each issue 50 + comments 50 $$$
Bulk scan + select 1-2 + 1 (selected) 1 $

Options:

oss-issue-analyzer list-issues [OPTIONS]

Options:
  --repo OWNER/REPO       # Repository (auto-detected from git)
  --platform github|gitlab|bitbucket  # Platform [default: auto-detect]
  --state open|all|closed  # Filter by state [default: open]
  --sort difficulty|number|created  # Sort results
  --filter-difficulty easy|medium|hard
  --filter-label TEXT      # e.g., "good first issue"
  --limit N                 # Max issues to show [default: 0=all]
  --cache-ttl HOURS        # Cache duration [default: 1]
  --no-cache                # Force re-fetch
  --workers N              # Parallel workers [default: auto]
  --json                   # JSON output
  --interactive            # Select and analyze immediately

Output Example:

╭────── List of Issues (repo: owner/repo, 47 open) ──────╮
│ #    Title                    Difficulty  Conf    Labels          │
│ 123  Fix parser crash         EASY       82%      good-first-issue │
│ 124  Add new feature          HARD       75%      enhancement      │
│ 125  Update README            EASY       90%      docs             │
└───────────────────────────────────────────────────────────────────────╯

Tip: Run 'oss-issue-analyzer analyze <number>' for detailed AI analysis

4. Analyze an Issue

# Using issue number (run from the cloned repo directory)
oss-issue-analyzer analyze 123

# Using platform URLs
oss-issue-analyzer analyze https://github.com/owner/repo/issues/123
oss-issue-analyzer analyze https://gitlab.com/owner/repo/-/issues/123
oss-issue-analyzer analyze https://bitbucket.org/owner/repo/issues/123

# Using platform prefix
oss-issue-analyzer analyze github:owner/repo#123
oss-issue-analyzer analyze gitlab:owner/repo#123
oss-issue-analyzer analyze bitbucket:owner/repo#123

# Force AI provider
oss-issue-analyzer analyze 123 --ai-provider openai

# Disable AI and use heuristics only
oss-issue-analyzer analyze 123 --no-ai

# Specify platform explicitly
oss-issue-analyzer analyze 123 --platform gitlab

The tool automatically detects the platform from the git remote URL.

Options:

oss-issue-analyzer analyze <issue_ref> [OPTIONS]

Arguments:
  issue_ref        Issue number, URL, or path to local markdown file

Options:
  --repo           Path to indexed repository
  --db-path        Path to index database
  --embedder       Embedding model [default: minilm]
  --limit           Number of indexed units to retrieve [default: 10]
  --gh-repo         Repository (owner/repo) - auto-detected if not provided
  --platform       Platform: github, gitlab, bitbucket [default: auto-detect]
  --ai-provider     AI provider to use (openai, anthropic, google, azure_openai)
  --no-ai          Disable AI scoring, use heuristics only

5. Use Local Issue File

oss-issue-analyzer analyze ./issue.md

The markdown file should start with a # Title heading.

How AI Scoring Works

When an AI provider is configured, the tool:

  1. Fetches GitHub issue comments (up to 7, prioritized by maintainer input and reaction count)
  2. Retrieves relevant code units using hybrid search (semantic + keyword)
  3. Builds a context-rich prompt including:
    • Issue title, body, type, and error patterns
    • GitHub issue comments with community/maintainer insights
    • Retrieved code units with signatures and docstrings
    • Heuristic scoring results for reference
  4. Sends to LLM for intelligent analysis
  5. Falls back to heuristics if AI is unavailable

Without AI, the tool uses rule-based heuristics to estimate difficulty based on code complexity, file types, dependency complexity, and issue metadata.

Output Example

AI-Powered Analysis

╭─────────────── Issue: Fix tokenizer performance ────────────────╮
│ Difficulty: EASY (conf: 88%) [AI]                            │
│ Relative: Easier than 75%                                      │
│                                                                │
│ Relevant files:                                                │
│   → src/tokenizer.py                                           │
│   → tests/test_tokenizer.py                                    │
│                                                                │
│ Suggested approach:                                            │
│   1. Start in src/tokenizer.py -> Tokenizer.encode             │
│   2. The batch processing logic needs optimization               │
│   3. Test: pytest tests/test_tokenizer.py                      │
│                                                                │
│ Contributor signals:                                           │
│  > Test file exists - changes are verifiable                   │
│  > Has documentation                                           │
│  > Isolated change possible                                    │
└────────────────────────────────────────────────────────────────╯

Heuristic Analysis (No AI)

╭─────────────── Issue: Fix tokenizer performance ────────────────╮
│ Difficulty: EASY (conf: 88%)                                   │
│ Relative: Easier than 75%                                      │
│                                                                │
│ Relevant files:                                                │
│   → src/tokenizer.py                                           │
│   → tests/test_tokenizer.py                                    │
│                                                                │
│ Suggested approach:                                            │
│   1. Start in src/tokenizer.py -> Tokenizer.encode             │
│   2. Bug is in the batch processing logic                      │
│   3. Test: pytest tests/test_tokenizer.py                      │
│                                                                │
│ Contributor signals:                                           │
│  > Test file exists - changes are verifiable                   │
│  > Has documentation                                           │
│  > Isolated change possible                                    │
└────────────────────────────────────────────────────────────────╯

Configuration

Environment Variables

Create a .env file in your project root (see .env.example for template):

Variable Description
GITHUB_TOKEN GitHub personal access token for API rate limits
GITLAB_TOKEN GitLab personal access token for API access
BITBUCKET_USERNAME Bitbucket username
BITBUCKET_APP_PASSWORD Bitbucket app password for API access
HF_TOKEN Hugging Face token for faster embedding downloads
OPENAI_API_KEY OpenAI API key
OPENAI_MODEL OpenAI model (default: gpt-4o-mini)
ANTHROPIC_API_KEY Anthropic API key
ANTHROPIC_MODEL Anthropic model (default: claude-3-haiku-20240307)
GOOGLE_API_KEY Google Gemini API key
AZURE_OPENAI_API_KEY Azure OpenAI API key
AZURE_OPENAI_ENDPOINT Azure OpenAI endpoint URL
AZURE_OPENAI_DEPLOYMENT Azure OpenAI deployment name
AI_ENABLED Enable/disable AI scoring (true/false)
AI_TIMEOUT_SECONDS AI request timeout (default: 30)

Configuration File

Provider preferences are saved to ~/.config/oss-issue-analyzer/config.json.

Cache Storage

Analysis results are cached in .oss-issue-analyzer-cache/ in the repository root:

  • issues/ - Issue lists with quick scores (fresh for 1 hour by default)
  • analysis/ - Full AI analysis for individual issues (cached indefinitely)

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run specific test files
pytest tests/test_quick_scorer.py
pytest tests/test_cache.py
pytest tests/test_bulk_processor.py
pytest tests/test_ai_scorer.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oss_issue_analyzer-1.0.3.tar.gz (234.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oss_issue_analyzer-1.0.3-py3-none-any.whl (68.6 kB view details)

Uploaded Python 3

File details

Details for the file oss_issue_analyzer-1.0.3.tar.gz.

File metadata

  • Download URL: oss_issue_analyzer-1.0.3.tar.gz
  • Upload date:
  • Size: 234.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for oss_issue_analyzer-1.0.3.tar.gz
Algorithm Hash digest
SHA256 cd2511e299d701766e2094997ba89da3b4714b99441115cc3d07ec274d2671c0
MD5 58053cb1d32f3377ab1a1d429687cb63
BLAKE2b-256 84a2b18537f4a83934483d706bdedfbfad36c318b2fd6e7a7bc6d55258d7cd58

See more details on using hashes here.

File details

Details for the file oss_issue_analyzer-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for oss_issue_analyzer-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 460d2110f32a409344e429e0e4e008220fd03c398eb3252cafbe79e20ccb4a81
MD5 15a1866594da2e09bc8f2533edbbb873
BLAKE2b-256 b4ba7030a7f3714fdeb18d7f342a4c3069eae210838a486c45baa35b19b39e05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page