A Python utility for analyzing GitHub repository structures
Project description
GitHub Repository Structure Analyzer
A Python utility for cloning GitHub repositories and generating a JSON-friendly representation of their directory structure. This tool provides a clean way to analyze repository layouts while respecting .gitignore rules and offering customizable exclude patterns for different project types.
Features
- Clone GitHub repositories with progress visualization
- Generate nested JSON representation of repository structure
- Support for private repositories via GitHub tokens
- Respect
.gitignorepatterns - Configurable depth limit for directory traversal
- Pre-defined exclude patterns for common project types (Python, Node.js, Java)
- Progress bar visualization during cloning using
alive-progress - Handles symlinks and special files appropriately
Installation
Prerequisites
Make sure you have Python 3.6+ installed. Then install the required dependencies:
pip install -r requirements.txt
Usage
Basic Usage
from repo_structure import get_repo_structure
# Simple example with default settings
structure = get_repo_structure(
github_url="https://github.com/username/repo.git",
clone_path="local_repo_folder"
)
Advanced Usage
from repo_structure import get_repo_structure, PROJECT_EXCLUDES
import json
# Using project-specific excludes and depth limit
structure = get_repo_structure(
github_url="https://github.com/username/repo.git",
clone_path="local_repo_folder",
token="your_github_token", # For private repos
max_depth=2, # Limit directory traversal depth
exclude_patterns=PROJECT_EXCLUDES["python"] # Use Python-specific excludes
)
# Print the structure as formatted JSON
print(json.dumps(structure, indent=2))
Private Repositories
For private repositories, you'll need to provide a GitHub personal access token:
structure = get_repo_structure(
github_url="https://github.com/oguzhancetinkaya/private-repo.git",
clone_path="private_repo_folder",
token="your_github_personal_access_token"
)
Exclude Patterns
The tool comes with predefined exclude patterns for different project types:
DEFAULT_EXCLUDES: Basic patterns like.git,node_modules,venvPYTHON_EXCLUDES: Python-specific patternsNODE_EXCLUDES: Node.js-specific patternsJAVA_EXCLUDES: Java-specific patterns
You can also provide your own custom exclude patterns:
custom_excludes = [
"*.log",
"temp",
"custom_folder"
]
structure = get_repo_structure(
github_url="https://github.com/username/repo.git",
clone_path="local_repo_folder",
exclude_patterns=custom_excludes
)
Output Format
The tool generates a nested dictionary structure that represents the repository layout:
{
"name": "repo_name",
"type": "directory",
"children": [
{
"name": "src",
"type": "directory",
"children": [
{
"name": "main.py",
"type": "file"
}
]
}
]
}
Components
repo_structure.py
The main module containing:
- Directory traversal logic
- Repository cloning functionality
- Exclude pattern definitions
.gitignoreintegration
clone_progress.py
A helper module that provides:
- Custom progress handler for Git operations
- Integration with the
alive-progresslibrary - Visual feedback during repository cloning
Contributing
Contributions are welcome! Some areas for potential improvement:
- Additional project-type exclude patterns
- Support for other Git hosting services
- Enhanced progress reporting
- Additional output formats
- Performance optimizations for large repositories
License
[Add your chosen license here]
License
This project is open-source. See the LICENSE file for details.
Thank you for using github-repo-structure ! If you have any questions or suggestions, don’t hesitate to create an issue or open a discussion. Happy coding! ```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file github_repo_structure-0.1.0.tar.gz.
File metadata
- Download URL: github_repo_structure-0.1.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7720f2467e87889693af336a4eb87f4beb5518a139bdcbd90ecd60f9fede41b4
|
|
| MD5 |
aadc291bcb584848f5cc97c7b5c052cf
|
|
| BLAKE2b-256 |
f3c2128ff473ae7221fac52428d1fe648660598f19fbb46bb277e56d95dee66f
|
File details
Details for the file github_repo_structure-0.1.0-py3-none-any.whl.
File metadata
- Download URL: github_repo_structure-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b051d4bff5694910ae0b6393ce21a0171f67ac1609f2206db47c6c0a061d6ff
|
|
| MD5 |
8866bdb288179f9bd0d4b09fe3331c49
|
|
| BLAKE2b-256 |
c5ae1f9d011e58361bfe32a122a9b8bc2ec55b27d5137e357485ebe976699a18
|