Forkscout: Powerful GitHub repository fork analysis tool that discovers valuable features across forks, ranks them by impact, and can create pull requests to integrate improvements back to upstream projects
Project description
Forkscout ๐
A powerful GitHub repository fork analysis tool that automatically discovers valuable features across all forks of a repository, ranks them by impact, and can create pull requests to integrate the best improvements back to the upstream project.
Features
- Fork Discovery: Automatically finds and catalogs all public forks of a repository
- Feature Analysis: Identifies meaningful changes and improvements in each fork
- Smart Ranking: Scores features based on code quality, community engagement, and impact
- Report Generation: Creates comprehensive markdown reports with feature summaries
- Automated PRs: Can automatically create pull requests for high-value features
- Caching: Intelligent caching system to avoid redundant API calls
Installation
Prerequisites
- Python 3.12 or higher
- uv package manager
Install uv
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or with pip
pip install uv
Install Forkscout
From PyPI (Recommended)
# Install with pip
pip install forkscout
# Or with uv
uv add forkscout
From Source (Development)
# Clone the repository
git clone https://github.com/Romamo/forkscout.git
cd forkscout
# Install dependencies
uv sync
# Install in development mode
uv pip install -e .
Quick Start
-
Set up your GitHub token:
cp .env.example .env # Edit .env and add your GitHub token
-
Analyze a repository:
uv run forkscout analyze https://github.com/pallets/click
-
Generate a report:
uv run forkscout analyze https://github.com/psf/requests --output report.md
-
Auto-create PRs for high-value features:
uv run forkscout analyze https://github.com/Textualize/rich --auto-pr --min-score 80
Configuration
Create a forkscout.yaml configuration file:
github:
token: ${GITHUB_TOKEN}
scoring:
code_quality_weight: 0.3
community_engagement_weight: 0.2
test_coverage_weight: 0.2
documentation_weight: 0.15
recency_weight: 0.15
analysis:
min_score_threshold: 70.0
max_forks_to_analyze: 100
excluded_file_patterns:
- "*.md"
- "*.txt"
- ".github/*"
# Commit counting configuration
commit_count:
max_count_limit: 100 # Maximum commits to count per fork (0 = unlimited)
display_limit: 5 # Maximum commits to show in display
use_unlimited_counting: false # Enable unlimited counting by default
timeout_seconds: 30 # Timeout for commit counting operations
cache:
duration_hours: 24
max_size_mb: 100
Usage Examples
Basic Analysis
forkscout analyze https://github.com/pallets/click
Fork Analysis Commands
# Show all forks with compact commit status
forkscout show-forks https://github.com/psf/requests
# Show forks with recent commits in a separate column
forkscout show-forks https://github.com/Textualize/rich --show-commits 3
# Show detailed fork information with exact commit counts
forkscout show-forks https://github.com/pytest-dev/pytest --detail
Commit Counting Options
# Basic exact commit counting (default: count up to 100 commits)
forkscout show-forks https://github.com/newmarcel/KeepingYouAwake --detail
# Unlimited commit counting for maximum accuracy (slower)
forkscout show-forks https://github.com/aarigs/pandas-ta --detail --max-commits-count 0
# Fast processing with lower commit limit
forkscout show-forks https://github.com/NoMore201/googleplay-api --detail --max-commits-count 50
# Custom display limit for commit messages
forkscout show-forks https://github.com/sanila2007/youtube-bot-telegram --show-commits 3 --commit-display-limit 10
# Focus on active forks only
forkscout show-forks https://github.com/maliayas/github-network-ninja --detail --ahead-only
Understanding Commit Status Format
The fork tables display commit status in a compact "+X -Y" format:
+5 -2means 5 commits ahead, 2 commits behind+3means 3 commits ahead, up-to-date-1means 1 commit behind, no new commits- Empty cell means completely up-to-date
Unknownmeans status could not be determined
With Custom Configuration
forkscout analyze https://github.com/virattt/ai-hedge-fund --config my-config.yaml
Automated PR Creation
forkscout analyze https://github.com/xgboosted/pandas-ta-classic --auto-pr --min-score 85
Verbose Output
forkscout analyze https://github.com/pallets/click --verbose
Troubleshooting
Common Issues
Commit counts showing "+1" for all forks:
- This was a bug in earlier versions. Update to the latest version.
- Use
--detailflag for accurate commit counting.
Slow performance with commit counting:
- Use
--max-commits-count 50for faster processing - Limit forks with
--max-forks 25 - Use
--ahead-onlyto skip inactive forks
"Unknown" commit counts:
- Usually indicates private/deleted forks or API rate limiting
- Check GitHub token configuration
- Try with
--verbosefor detailed error information
For comprehensive troubleshooting, see docs/COMMIT_COUNTING_TROUBLESHOOTING.md.
Development
Setup Development Environment
# Clone and setup
git clone https://github.com/Romamo/forkscout.git
cd forkscout
uv sync --dev
# Install pre-commit hooks
uv run pre-commit install
Running Tests
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=src --cov-report=html
# Run only unit tests
uv run pytest tests/unit/
# Run only integration tests
uv run pytest tests/integration/
Code Quality
# Format code
uv run black src/ tests/
# Lint code
uv run ruff check src/ tests/
# Type checking
uv run mypy src/
Evaluation Criteria
Forkscout uses a sophisticated evaluation system to analyze commits and determine their value for the main repository. This section explains how the system makes decisions about commit categorization, impact assessment, and value determination.
Commit Categorization
The system categorizes each commit into one of the following types based on commit message patterns and file changes:
Category Types and Patterns
๐ Feature - New functionality or enhancements
- Message patterns:
feat:,feature,implement,new,add,introduce,create,build,support for,enable - Examples:
feat: add user authentication systemimplement OAuth2 login flowadd support for PostgreSQL database
๐ Bugfix - Error corrections and issue resolutions
- Message patterns:
fix:,bug,patch,hotfix,repair,resolve,correct,address,issue,problem,error - Examples:
fix: resolve memory leak in data processingcorrect validation error in user inputpatch security vulnerability in auth module
๐ง Refactor - Code improvements without functional changes
- Message patterns:
refactor:,clean,improve,restructure,reorganize,simplify,extract,rename,move - Examples:
refactor: extract common validation logicimprove code organization in user modulesimplify database connection handling
๐ Documentation - Documentation updates and improvements
- Message patterns:
docs:,documentation,readme,comment,comments,docstring,guide,tutorial,example - File patterns:
README.*,*.md,*.rst,docs/,*.txt - Examples:
docs: update installation instructionsadd API documentation for user endpointsimprove code comments in core modules
๐งช Test - Test additions and improvements
- Message patterns:
test:,tests,testing,spec,unittest,pytest,coverage,mock,fixture,assert - File patterns:
test_*.py,*_test.py,tests/,*.test.js,*.spec.js - Examples:
test: add unit tests for user serviceimprove test coverage for authenticationadd integration tests for API endpoints
๐จ Chore - Maintenance and build-related changes
- Message patterns:
chore:,maintenance,upgrade,dependency,dependencies,version,config,configuration,setup - File patterns:
requirements.txt,package.json,pyproject.toml,setup.py,Dockerfile,.github/,.gitignore - Examples:
chore: update dependencies to latest versionsupgrade Python to 3.12configure CI/CD pipeline
โก Performance - Performance optimizations
- Message patterns:
perf:,performance,speed,fast,optimize,optimization,efficient,cache,caching,memory - Examples:
perf: optimize database query performanceimprove memory usage in data processingadd caching layer for API responses
๐ Security - Security-related changes
- Message patterns:
security:,secure,vulnerability,auth,authentication,authorization,encrypt,decrypt,hash - File patterns:
*auth*.py,*security*.py,*crypto*.py - Examples:
security: fix SQL injection vulnerabilityimplement secure password hashingadd rate limiting to API endpoints
โ Other - Changes that don't fit standard categories
- Used when commit patterns don't match any specific category
- Often indicates complex or unclear changes
Impact Assessment
The system evaluates the potential impact of each commit using multiple factors:
File Criticality Rules
Files are assessed for criticality based on their role in the project:
๐ด Critical Files (Score: 1.0)
- Core application files:
main.py,index.js,app.py,server.py - Entry points:
__init__.py,setup.py,pyproject.toml,package.json - Files explicitly listed in project's critical files
๐ High Criticality (Score: 0.8-0.9)
- Security files:
*auth*.py,*security*.py,*crypto*.py,*permission*.py - Configuration files:
config.*,settings.*,.env*,Dockerfile,docker-compose.yml
๐ก Medium-High Criticality (Score: 0.7)
- Database/model files:
*model*.py,*schema*.py,*migration*.py,*database*.py
๐ข Medium Criticality (Score: 0.6)
- API/interface files:
*api*.py,*endpoint*.py,*route*.py,*controller*.py
๐ต Low Criticality (Score: 0.1-0.2)
- Test files:
test_*.py,*_test.py,tests/,*.test.js,*.spec.js - Documentation:
README.*,*.md,*.rst,docs/
Change Magnitude Calculation
The system calculates change magnitude based on:
- Lines changed: Additions + deletions (weighted 70%)
- Files changed: Number of modified files (weighted 30%)
- Size bonuses: Large changes (>500 lines) get 1.5x multiplier, medium changes (>200 lines) get 1.2x multiplier
Quality Factors
Test Coverage Factor
- Measures proportion of test files in the change
- Bonus points for including any test files
- Score: 0.0 (no tests) to 1.0 (comprehensive test coverage)
Documentation Factor
- Measures proportion of documentation files
- Bonus points for including any documentation
- Score: 0.0 (no docs) to 1.0 (comprehensive documentation)
Code Organization Factor
- Evaluates focus and coherence of changes
- Bonus for focused changes (โค3 files)
- Penalty for scattered changes (>10 files)
- Considers average changes per file
Commit Quality Factor
- Message length and descriptiveness
- Conventional commit format bonus
- Penalty for merge commits
Impact Level Determination
The system combines all factors to determine overall impact:
- ๐ด Critical (Score โฅ 0.8): Major changes to critical files with high quality
- ๐ High (Score โฅ 0.6): Significant changes to important files
- ๐ก Medium (Score โฅ 0.3): Moderate changes with reasonable scope
- ๐ข Low (Score < 0.3): Minor changes or low-impact files
Value Assessment for Main Repository
The system determines whether each commit could be valuable for the main repository:
"Yes" - Valuable for Main Repository
Automatic "Yes" Categories:
- Bugfixes: Error corrections benefit all users
- Security fixes: Critical for all installations
- Performance improvements: Speed benefits everyone
- Documentation: Helps all users understand the project
- Tests: Improve reliability for everyone
Conditional "Yes" Examples:
- Features: Substantial new functionality (>50 lines changed)
- Refactoring: Significant code improvements
- Dependency updates: Security or compatibility improvements
Example "Yes" Commits:
โ
fix: resolve memory leak in data processing loop
โ
security: patch SQL injection vulnerability in user queries
โ
perf: optimize database connection pooling (40% faster)
โ
feat: add comprehensive input validation system
โ
docs: add troubleshooting guide for common errors
โ
test: add integration tests for payment processing
"No" - Not Relevant for Main Repository
Typical "No" Scenarios:
- Fork-specific configurations or customizations
- Environment-specific changes
- Personal preferences or styling
- Changes that break compatibility
- Experimental or incomplete features
Example "No" Commits:
โ chore: update personal development environment setup
โ feat: add company-specific branding and logos
โ config: change database from PostgreSQL to MongoDB for our use case
โ style: reformat code according to personal preferences
โ feat: add integration with internal company API
"Unclear" - Needs Further Review
Typical "Unclear" Scenarios:
- Small features that might be too specific
- Refactoring without clear benefits
- Complex changes that do multiple things
- Changes with insufficient context
- Experimental or unfinished work
Example "Unclear" Commits:
โ refactor: minor code cleanup in utility functions
โ feat: add small convenience method for date formatting
โ fix: workaround for edge case in specific environment
โ update: misc changes and improvements
โ feat: experimental feature for advanced users
Decision Trees and Logic Flow
Commit Categorization Flow
1. Check commit message for conventional commit prefix (feat:, fix:, etc.)
โโ If found โ Use prefix category with high confidence (0.9)
โโ If not found โ Continue to pattern matching
2. Analyze commit message for category keywords
โโ Multiple matches โ Use highest priority match
โโ No matches โ Continue to file analysis
3. Analyze changed files for category patterns
โโ Strong file pattern match (>80% files) โ Use file category
โโ Weak or mixed patterns โ Continue to combination logic
4. Combine message and file analysis
โโ Message and files agree โ Boost confidence (+0.2)
โโ Message confidence > File confidence โ Use message category
โโ File confidence > Message confidence โ Use file category
โโ Equal confidence โ Default to message category or OTHER
Impact Assessment Flow
1. Calculate Change Magnitude
โโ Count lines changed (additions + deletions)
โโ Count files changed
โโ Apply size multipliers for large changes
2. Assess File Criticality
โโ Check against critical file patterns
โโ Calculate weighted average by change size
โโ Return criticality score (0.0 to 1.0)
3. Evaluate Quality Factors
โโ Test coverage: Proportion of test files
โโ Documentation: Proportion of doc files
โโ Code organization: Focus and coherence
โโ Commit quality: Message and format quality
4. Determine Impact Level
โโ Combine: 40% magnitude + 40% criticality + 20% quality
โโ Score โฅ 0.8 โ Critical
โโ Score โฅ 0.6 โ High
โโ Score โฅ 0.3 โ Medium
โโ Score < 0.3 โ Low
Value Assessment Flow
1. Check Category Type
โโ Bugfix/Security/Performance โ Automatic "Yes"
โโ Docs/Test โ Automatic "Yes"
โโ Feature/Refactor/Chore โ Continue evaluation
2. Analyze Change Scope
โโ Substantial changes (>50 lines) โ Likely "Yes"
โโ Small changes (<20 lines) โ Likely "Unclear"
โโ Medium changes โ Continue evaluation
3. Check for Fork-Specific Indicators
โโ Personal/company-specific terms โ "No"
โโ Environment-specific configs โ "No"
โโ Generic improvements โ Continue evaluation
4. Final Assessment
โโ Clear benefit to all users โ "Yes"
โโ Clearly fork-specific โ "No"
โโ Uncertain or context-dependent โ "Unclear"
Troubleshooting Common Questions
"Why was my commit categorized as 'Other'?"
Possible reasons:
- Commit message doesn't match known patterns
- Mixed file types that don't clearly indicate category
- Generic or unclear commit message
Solutions:
- Use conventional commit format:
feat:,fix:,docs:, etc. - Write descriptive commit messages with clear action words
- Focus commits on single types of changes
"Why is the impact level lower than expected?"
Common causes:
- Changes affect low-criticality files (tests, docs)
- Small change magnitude (few lines/files changed)
- Poor commit quality (short message, merge commit)
- Low quality factors (no tests or docs included)
To increase impact:
- Include changes to core application files
- Add tests and documentation with your changes
- Write descriptive commit messages
- Make focused, substantial changes
"Why was my feature marked as 'Unclear' for main repo value?"
Typical reasons:
- Feature appears too specific or niche
- Insufficient context to determine general usefulness
- Small or experimental change
- Complex commit that does multiple things
To improve assessment:
- Write clear commit messages explaining the benefit
- Include documentation explaining the feature
- Make focused commits that do one thing well
- Consider if the feature would help other users
"The system missed an important security fix"
Possible issues:
- Commit message doesn't include security keywords
- Files don't match security patterns
- Change appears as refactoring or other category
Improvements:
- Use security-related keywords:
security,vulnerability,auth,secure - Use conventional commit format:
security: fix vulnerability in... - Include security-related files in the change
"My documentation update was categorized as 'Chore'"
Common causes:
- Files don't match documentation patterns
- Commit message uses maintenance-related words
- Mixed changes including config files
Solutions:
- Use doc-specific keywords:
docs,documentation,readme - Focus commits on documentation files only
- Use conventional commit format:
docs: update installation guide
Understanding Explanation Output
When using the --explain flag, you'll see structured output with clear separation between factual descriptions and system assessments:
๐ Description: Added user authentication middleware to handle JWT tokens
โ๏ธ Assessment: Value for main repo: YES
Category: ๐ Feature | Impact: ๐ด High
Reasoning: Large changes affecting critical security files with test coverage
Key sections:
- ๐ Description: Factual description of what changed
- โ๏ธ Assessment: System's evaluation and judgment
- Category: Determined commit type with confidence
- Impact: Assessed impact level with reasoning
- Value: Whether this could help the main repository
This separation helps you distinguish between objective facts about the commit and the system's subjective assessment of its value.
Visual Formatting Guide
The system uses consistent visual indicators to help you quickly scan results:
Category Icons:
- ๐ Feature - New functionality
- ๐ Bugfix - Error corrections
- ๐ง Refactor - Code improvements
- ๐ Documentation - Docs and guides
- ๐งช Test - Testing improvements
- ๐จ Chore - Maintenance tasks
- โก Performance - Speed optimizations
- ๐ Security - Security fixes
- โ Other - Uncategorized changes
Impact Level Colors:
- ๐ด Critical - Major system changes
- ๐ High - Significant improvements
- ๐ก Medium - Moderate changes
- ๐ข Low - Minor modifications
Value Assessment:
- โ Yes - Valuable for main repository
- โ No - Fork-specific only
- โ Unclear - Needs further review
Complexity Indicators:
- โ ๏ธ Complex commits that do multiple things are flagged for careful review
- Simple, focused commits are preferred for easier integration
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests for your changes
- Ensure all tests pass (
uv run pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file forkscout-1.0.5.tar.gz.
File metadata
- Download URL: forkscout-1.0.5.tar.gz
- Upload date:
- Size: 940.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da97f7c162a321fc71599336761605853955b8c9f8dce408f8c9acd69259a8d7
|
|
| MD5 |
9c31df07e38b7da9b6579fbec8d10a7a
|
|
| BLAKE2b-256 |
94264eee2c121b546e7d6cc940b8c06e73324827510b612195ddcff9c78473bf
|
File details
Details for the file forkscout-1.0.5-py3-none-any.whl.
File metadata
- Download URL: forkscout-1.0.5-py3-none-any.whl
- Upload date:
- Size: 362.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0be25af03c7692e8200a72bc636d6abbedfa61401a4b4aa8d458bedffc96ab50
|
|
| MD5 |
2cfa3eb39a3d2a8e269b4daf4e830e91
|
|
| BLAKE2b-256 |
b9a2bc617a73cb99ba4e885d0771bc7deb87a72c2d931e8272e8edd1615a0755
|