A comprehensive tool for analyzing GitHub organization statistics
Project description
GitHub Organization Statistics Tool
A comprehensive, open-source tool for analyzing GitHub organization statistics including repository metrics, contributor activity, code quality insights, and revolutionary multi-organization analysis with GitHub Apps.
๐ NEW in v1.1.0: Multi-Organization Analysis
Analyze multiple organizations in a single command! Use the new --org-ids parameter to process multiple GitHub organizations simultaneously, combining all data into unified output files while maintaining organization attribution.
# Analyze multiple organizations in one command
github-org-stats --org-ids "org1:install_id1,org2:install_id2,org3:install_id3" --format all
๐ Features
Revolutionary Multi-Organization Analysis
- ๐ Single Command Multi-Org: Analyze multiple organizations in one run with
--org-ids - ๐ Unified Output: All repository data combined into single files with organization attribution
- ๐ Enhanced Excel Reports: Additional "Organization_Breakdown" sheet for multi-org analysis
- ๐ Smart Authentication: Automatic GitHub App token management across organizations
- โก Efficient Processing: Intelligent distribution of API limits across organizations
Core Analysis Features
- Repository Analysis: Comprehensive metrics including stars, forks, issues, languages, and activity
- Contributor Insights: Detailed contributor analysis with bot filtering capabilities
- Code Quality Metrics: Language statistics, dependency analysis, and security insights
- GitHub App Integration: Enterprise-grade authentication for analyzing multiple organizations
- Flexible Output: JSON, CSV, and Excel formats with rich formatting
- Advanced Filtering: Include/exclude forks, archived repos, empty repos, and bot accounts
- Rate Limit Management: Intelligent rate limiting and retry mechanisms
- Error Handling: Robust error handling with detailed logging and recovery
Advanced Features
- Language Name Sanitization: Intelligent handling of problematic language names (C#, C++, F#) to prevent Excel column conflicts
- Dependency Analysis: Detect and analyze dependencies from package.json, requirements.txt, Gemfile, pom.xml, build.gradle, Cargo.toml, and go.mod
- Submodule Detection: Identify and catalog Git submodules
- GitHub Actions Integration: Analyze workflow configurations and recent runs
- Branch Protection Analysis: Check default branch protection settings
- Release Tracking: Monitor latest releases and version information
- Security Insights: Collaborator analysis, team permissions, and admin detection
- Bot Detection: Advanced bot account filtering with configurable patterns
- Performance Optimization: Adaptive batch sizing and memory management
๐ฆ Installation
Prerequisites
- Python 3.7 or higher
- pip package manager
Quick Install
git clone https://github.com/zoharbabin/github-org-stats.git
cd github-org-stats
pip install -e .
Install from PyPI (when available)
pip install github-org-stats
๐ง Quick Start
๐ Multi-Organization Analysis (Recommended)
Analyze multiple organizations in a single command:
# Set environment variables
export GITHUB_APP_ID=12345
export GITHUB_PRIVATE_KEY_PATH=/path/to/private-key.pem
# Analyze multiple organizations
github-org-stats \
--org-ids "org1:install_id1,org2:install_id2,org3:install_id3" \
--include-forks \
--include-archived \
--exclude-bots \
--max-repos 6000 \
--days-back 365 \
--format all \
--output-dir ./multi_org_reports
Single Organization Analysis
Analyze a single GitHub organization:
# With personal access token
python github_org_stats.py --org your-org --token ghp_your_token_here
# With GitHub App
python github_org_stats.py \
--org your-org \
--app-id 12345 \
--private-key /path/to/private-key.pem \
--installation-id 67890
Advanced Single Organization Usage
# Generate all output formats with comprehensive analysis
python github_org_stats.py \
--org your-org \
--token ghp_token \
--format all \
--exclude-bots \
--include-forks \
--include-archived \
--max-repos 1000 \
--days-back 365 \
--output-dir ./reports
๐ Command Line Arguments
Authentication Options
--token- GitHub personal access token--app-id- GitHub App ID for authentication--private-key- Path to GitHub App private key file--installation-id- GitHub App installation ID (supports multiple: "org1:id1,org2:id2" or single: "12345")--installation-ids- Alias for --installation-id
Scope Options
--org- GitHub organization name to analyze (single organization mode)--org-ids- NEW Multiple organizations with installation IDs in format "org1:id1,org2:id2" (multi-organization mode)--repos- Specific repositories to analyze (space-separated list)--days-back- Number of days to look back for activity (default: 30)
Multi-Organization Mode
The new --org-ids parameter enables analyzing multiple organizations in a single command run:
- Format:
"org1:installation_id1,org2:installation_id2,org3:installation_id3" - All data is combined into unified output files
- Excel output includes an additional "Organization_Breakdown" sheet
- Each repository record includes an "organization" field
- Cannot be used together with
--org(choose single or multi-organization mode)
Output Options
--output-dir- Output directory for reports (default: output)--format- Output format: json, csv, excel, all (default: excel)--config- Configuration file path (JSON format)
Logging Options
--log-level- Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)--log-file- Log file path (default: console only)
Analysis Options
--include-forks- Include forked repositories in analysis--include-archived- Include archived repositories in analysis--max-repos- Maximum number of repositories to analyze (default: 100)--exclude-bots- Exclude bot accounts from contributor analysis and commit statistics--include-empty- Include repositories with no commits in the specified timeframe
๐ Authentication
Personal Access Token
For individual use or small-scale analysis:
- Go to GitHub Settings โ Developer settings โ Personal access tokens
- Generate a new token with these permissions:
repo- Full control of private repositoriesread:org- Read organization membershipread:user- Read user profile data
# Using token directly
python github_org_stats.py --org your-org --token ghp_your_token_here
# Using environment variable
export GITHUB_TOKEN=ghp_your_token_here
python github_org_stats.py --org your-org --token $GITHUB_TOKEN
GitHub App Authentication
For enterprise use, multi-organization analysis, and higher rate limits:
Setup GitHub App
-
Go to GitHub Settings โ Developer settings โ GitHub Apps
-
Create a new GitHub App with these permissions:
- Repository permissions:
- Contents: Read
- Issues: Read
- Metadata: Read
- Pull requests: Read
- Actions: Read
- Organization permissions:
- Members: Read
- Administration: Read
- Repository permissions:
-
Generate and download a private key
-
Install the app on target organizations
-
Note the App ID and Installation IDs
Using GitHub App
# Single organization
python github_org_stats.py \
--org your-org \
--app-id 12345 \
--private-key /path/to/private-key.pem \
--installation-id 67890
# Multiple organizations
python github_org_stats.py \
--org your-org \
--app-id 12345 \
--private-key /path/to/private-key.pem \
--installation-id "org1:111,org2:222,org3:333"
Environment Variables
export GITHUB_APP_ID=12345
export GITHUB_PRIVATE_KEY_PATH=/path/to/private-key.pem
python github_org_stats.py --org your-org
๐ Output Formats
Excel Output (Default)
Professional Excel workbook with multiple sheets:
- Repository_Data: Complete repository information with all metrics and organization attribution
- Summary: High-level statistics across all analyzed organizations
- ๐ Organization_Breakdown: Per-organization statistics (multi-org mode only)
- Contributors: Top contributors analysis with contribution counts
- Languages: Language distribution and code statistics
- Errors: Error tracking and debugging information
JSON Output
Structured JSON with complete data hierarchy:
Single Organization:
{
"organizations": ["your-org"],
"analysis_mode": "single-organization",
"analyzed_at": "2025-05-29T22:30:00",
"total_repositories": 150,
"repositories": [...]
}
Multi-Organization:
{
"organizations": ["org1", "org2", "org3"],
"analysis_mode": "multi-organization",
"analyzed_at": "2025-05-29T22:30:00",
"total_repositories": 450,
"repositories": [
{
"organization": "org1",
"name": "repo1",
"full_name": "org1/repo1",
...
}
]
}
CSV Output
Flattened data suitable for spreadsheet analysis and data processing tools, with organization column for multi-org analysis.
โ๏ธ Configuration File
Use a JSON configuration file for complex setups:
python github_org_stats.py --config config/example_config.json --org your-org
Example configuration:
{
"authentication": {
"app_id": 12345,
"private_key_path": "/path/to/private-key.pem",
"installation_mappings": {
"org1": 67890,
"org2": 11111
}
},
"analysis": {
"days_back": 60,
"max_repos": 200,
"include_forks": false,
"exclude_bots": true
},
"output": {
"format": "excel",
"output_dir": "./reports"
}
}
๐ Advanced Features
Language Name Sanitization
Intelligent handling of problematic programming language names that can cause issues in Excel exports:
- C# โ CSharp: Prevents conflicts with C language statistics
- C++ โ CPlusPlus: Avoids Excel column name parsing issues
- F# โ FSharp: Ensures proper Excel compatibility
Benefits:
- Eliminates Excel column name conflicts
- Preserves accurate language statistics and byte counts
- Maintains data integrity across all output formats
- Automatic transformation with comprehensive logging
Example:
// Before sanitization
"languages": {"C#": 1500000, "C": 800000, "C++": 500000}
// After sanitization
"languages": {"CSharp": 1500000, "C": 800000, "CPlusPlus": 500000}
Dependency Analysis
Automatically detects and analyzes dependencies from:
- Node.js: package.json
- Python: requirements.txt
- Ruby: Gemfile
- Java: pom.xml
- Gradle: build.gradle
- Rust: Cargo.toml
- Go: go.mod
Bot Detection
Advanced bot account filtering with configurable patterns:
- GitHub Actions bots
- Dependabot and Renovate
- Code quality bots (CodeCov, SonarCloud)
- Security bots (Snyk, WhiteSource)
- Custom bot patterns
GitHub Actions Integration
- Workflow count and status
- Recent workflow runs
- Action configuration analysis
Security Analysis
- Branch protection settings
- Collaborator permissions
- Team access analysis
- Admin user identification
๐งช Testing
Run the comprehensive test suite:
cd tests
python test_github_org_stats.py
Run specific test categories:
python test_github_org_stats.py --category auth
python test_github_org_stats.py --category data
python test_github_org_stats.py --category excel
๐ ๏ธ Development Setup
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Install in development mode:
pip install -e .[dev] - Make your changes
- Run tests:
python -m pytest tests/ - Run linting:
black . && flake8 - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
Development Dependencies
Development dependencies are defined in pyproject.toml and can be installed with:
pip install -e .[dev]
Code Style
This project uses:
- Black for code formatting
- Flake8 for linting
- MyPy for type checking
๐ Troubleshooting
Common Issues
Authentication Errors
Error: Authentication required
Solution: Ensure you provide either --token or both --app-id and --private-key
Rate Limit Issues
Rate limit exceeded
Solution:
- Use GitHub App authentication for higher limits
- Reduce
--max-reposvalue - Increase
--days-backto reduce API calls
Permission Errors
403 Forbidden
Solution:
- Verify token has required permissions (
repo,read:org,read:user) - For GitHub Apps, ensure proper installation and permissions
Memory Issues
MemoryError or system slowdown
Solution:
- Reduce
--max-reposvalue - Use
--format jsonor--format csvinstead of Excel - Process organizations in smaller batches
Debug Mode
Enable debug logging for detailed troubleshooting:
python github_org_stats.py \
--org your-org \
--token your-token \
--log-level DEBUG \
--log-file debug.log
Performance Optimization
For large organizations:
# Optimize for speed
python github_org_stats.py \
--org large-org \
--token your-token \
--max-repos 500 \
--days-back 30 \
--exclude-bots \
--format json
๐ Usage Examples
Multi-Organization Analysis (Single Command)
Analyze multiple organizations in a single run using the new --org-ids parameter:
# Using environment variables (recommended)
export GITHUB_APP_ID=12345
export GITHUB_PRIVATE_KEY_PATH=/secure/enterprise-key.pem
python github_org_stats.py \
--org-ids "kaltura:68242466,kaltura-ps:68357040" \
--include-forks \
--include-archived \
--exclude-bots \
--include-empty \
--max-repos 6000 \
--days-back 365 \
--format all \
--log-level INFO \
--log-file multi_org_analysis.log \
--output-dir ./multi_org_reports
Multi-Organization Analysis with Explicit Parameters
python github_org_stats.py \
--org-ids "first-org:11111,second-org:22222,third-org:33333" \
--app-id 12345 \
--private-key /secure/enterprise-key.pem \
--include-forks \
--include-archived \
--exclude-bots \
--include-empty \
--max-repos 9000 \
--days-back 365 \
--format all \
--log-level INFO \
--log-file multi_org_analysis.log \
--output-dir ./multi_org_reports
Single Organization Analysis (Legacy Mode)
For analyzing a single organization:
python github_org_stats.py \
--org enterprise-org \
--installation-id 67890 \
--include-forks \
--include-archived \
--exclude-bots \
--include-empty \
--max-repos 3000 \
--days-back 365 \
--format all \
--log-level INFO \
--log-file enterprise_analysis.log \
--output-dir ./enterprise_reports
Enterprise Analysis with Explicit Parameters
python github_org_stats.py \
--org enterprise-org \
--app-id 12345 \
--private-key /secure/enterprise-key.pem \
--installation-id 67890 \
--include-forks \
--include-archived \
--exclude-bots \
--include-empty \
--max-repos 3000 \
--days-back 365 \
--format all \
--log-level INFO \
--log-file enterprise_analysis.log \
--output-dir ./enterprise_reports
Quick Overview
python github_org_stats.py \
--org your-org \
--token ghp_token \
--max-repos 10 \
--days-back 7 \
--format json
Comprehensive Analysis with Personal Access Token
python github_org_stats.py \
--org your-org \
--token ghp_token \
--include-forks \
--include-archived \
--exclude-bots \
--include-empty \
--max-repos 1000 \
--days-back 365 \
--format all \
--log-level INFO \
--log-file comprehensive_analysis.log \
--output-dir ./comprehensive_reports
Large Scale Analysis (High Repository Count)
For organizations with many repositories:
python github_org_stats.py \
--org large-org \
--installation-id 99999 \
--include-forks \
--include-archived \
--exclude-bots \
--include-empty \
--max-repos 5000 \
--days-back 365 \
--format all \
--log-level INFO \
--log-file large_org_analysis.log \
--output-dir ./large_org_reports
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Support
- Documentation: This comprehensive README and example configurations
- Issues: Report bugs or request features via GitHub Issues
- Discussions: Join the conversation in GitHub Discussions
๐ Star History
๐ Acknowledgments
- Thanks to all contributors who have helped improve this tool
- Built with PyGithub for GitHub API access
- Inspired by the need for comprehensive GitHub organization analysis
- Special thanks to the open-source community for feedback and contributions
Made with โค๏ธ by the open-source community
Version 1.1.0 | Changelog | Contributing Guidelines
๐ฏ What's New in v1.1.0
- ๐ Multi-Organization Analysis: Analyze multiple GitHub organizations in a single command
- ๐ง Enhanced Authentication: Better environment variable support and GitHub App integration
- ๐ Improved Excel Output: Organization breakdown sheets and enhanced reporting
- ๐ ๏ธ Better Error Handling: More robust authentication and API error management
- ๐ Updated Documentation: Comprehensive examples and usage guides
Ready to analyze your GitHub organizations? Get started with the Quick Start guide!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file github_org_stats-1.1.0.tar.gz.
File metadata
- Download URL: github_org_stats-1.1.0.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8a5b0fede7c026188b8c22ab177cdd65a1abe7255301db7c52c195c50e92ceb
|
|
| MD5 |
9c63d0fc2bc8a01ebf6b371b1f6971d6
|
|
| BLAKE2b-256 |
fd03d80ea6c9f634ec716300d4a1f617d2b070d2085cded9a1e1c4902994eb42
|
Provenance
The following attestation bundles were made for github_org_stats-1.1.0.tar.gz:
Publisher:
test-and-publish.yml on zoharbabin/github-org-stats
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
github_org_stats-1.1.0.tar.gz -
Subject digest:
c8a5b0fede7c026188b8c22ab177cdd65a1abe7255301db7c52c195c50e92ceb - Sigstore transparency entry: 224782866
- Sigstore integration time:
-
Permalink:
zoharbabin/github-org-stats@99702bff0c3f9ce2028ce03f2aec03f0c90e68a7 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/zoharbabin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
test-and-publish.yml@99702bff0c3f9ce2028ce03f2aec03f0c90e68a7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file github_org_stats-1.1.0-py3-none-any.whl.
File metadata
- Download URL: github_org_stats-1.1.0-py3-none-any.whl
- Upload date:
- Size: 24.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e53b81cd908b05119f39df7a9057b6472ccb92fd59bbcf37ffa1d73d0b51f2b0
|
|
| MD5 |
fc275fb21db4ff08a3dae49c8ec58a24
|
|
| BLAKE2b-256 |
adf7a3d56d6ccd6d7236d6d7a9a5b5c15c025226c01dc755e44594d32db14c25
|
Provenance
The following attestation bundles were made for github_org_stats-1.1.0-py3-none-any.whl:
Publisher:
test-and-publish.yml on zoharbabin/github-org-stats
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
github_org_stats-1.1.0-py3-none-any.whl -
Subject digest:
e53b81cd908b05119f39df7a9057b6472ccb92fd59bbcf37ffa1d73d0b51f2b0 - Sigstore transparency entry: 224782870
- Sigstore integration time:
-
Permalink:
zoharbabin/github-org-stats@99702bff0c3f9ce2028ce03f2aec03f0c90e68a7 -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/zoharbabin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
test-and-publish.yml@99702bff0c3f9ce2028ce03f2aec03f0c90e68a7 -
Trigger Event:
push
-
Statement type: