Skip to main content

A Python utility for analyzing GitHub repository structures

Project description

GitHub Repository Structure Analyzer

A Python utility for cloning GitHub repositories and generating a JSON-friendly representation of their directory structure. This tool provides a clean way to analyze repository layouts while respecting .gitignore rules and offering customizable exclude patterns for different project types.

Features

  • Clone GitHub repositories with progress visualization
  • Generate nested JSON representation of repository structure
  • Support for private repositories via GitHub tokens
  • Respect .gitignore patterns
  • Configurable depth limit for directory traversal
  • Pre-defined exclude patterns for common project types (Python, Node.js, Java)
  • Progress bar visualization during cloning using alive-progress
  • Handles symlinks and special files appropriately

Installation

Prerequisites

Make sure you have Python 3.6+ installed. Then install the required dependencies:

pip install -r requirements.txt

Usage

Basic Usage

from repo_structure import get_repo_structure

# Simple example with default settings
structure = get_repo_structure(
    github_url="https://github.com/username/repo.git",
    clone_path="local_repo_folder"
)

Advanced Usage

from repo_structure import get_repo_structure, PROJECT_EXCLUDES
import json

# Using project-specific excludes and depth limit
structure = get_repo_structure(
    github_url="https://github.com/username/repo.git",
    clone_path="local_repo_folder",
    token="your_github_token",  # For private repos
    max_depth=2,  # Limit directory traversal depth
    exclude_patterns=PROJECT_EXCLUDES["python"]  # Use Python-specific excludes
)

# Print the structure as formatted JSON
print(json.dumps(structure, indent=2))

Private Repositories

For private repositories, you'll need to provide a GitHub personal access token:

structure = get_repo_structure(
    github_url="https://github.com/oguzhancetinkaya/private-repo.git",
    clone_path="private_repo_folder",
    token="your_github_personal_access_token"
)

Exclude Patterns

The tool comes with predefined exclude patterns for different project types:

  • DEFAULT_EXCLUDES: Basic patterns like .git, node_modules, venv
  • PYTHON_EXCLUDES: Python-specific patterns
  • NODE_EXCLUDES: Node.js-specific patterns
  • JAVA_EXCLUDES: Java-specific patterns

You can also provide your own custom exclude patterns:

custom_excludes = [
    "*.log",
    "temp",
    "custom_folder"
]

structure = get_repo_structure(
    github_url="https://github.com/username/repo.git",
    clone_path="local_repo_folder",
    exclude_patterns=custom_excludes
)

Output Format

The tool generates a nested dictionary structure that represents the repository layout:

{
  "name": "repo_name",
  "type": "directory",
  "children": [
    {
      "name": "src",
      "type": "directory",
      "children": [
        {
          "name": "main.py",
          "type": "file"
        }
      ]
    }
  ]
}

Components

repo_structure.py

The main module containing:

  • Directory traversal logic
  • Repository cloning functionality
  • Exclude pattern definitions
  • .gitignore integration

clone_progress.py

A helper module that provides:

  • Custom progress handler for Git operations
  • Integration with the alive-progress library
  • Visual feedback during repository cloning

Contributing

Contributions are welcome! Some areas for potential improvement:

  • Additional project-type exclude patterns
  • Support for other Git hosting services
  • Enhanced progress reporting
  • Additional output formats
  • Performance optimizations for large repositories

License

[Add your chosen license here]


License

This project is open-source. See the LICENSE file for details.


Thank you for using github-repo-structure ! If you have any questions or suggestions, don’t hesitate to create an issue or open a discussion. Happy coding! ```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

github_repo_structure-0.1.0.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

github_repo_structure-0.1.0-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file github_repo_structure-0.1.0.tar.gz.

File metadata

  • Download URL: github_repo_structure-0.1.0.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for github_repo_structure-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7720f2467e87889693af336a4eb87f4beb5518a139bdcbd90ecd60f9fede41b4
MD5 aadc291bcb584848f5cc97c7b5c052cf
BLAKE2b-256 f3c2128ff473ae7221fac52428d1fe648660598f19fbb46bb277e56d95dee66f

See more details on using hashes here.

File details

Details for the file github_repo_structure-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for github_repo_structure-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5b051d4bff5694910ae0b6393ce21a0171f67ac1609f2206db47c6c0a061d6ff
MD5 8866bdb288179f9bd0d4b09fe3331c49
BLAKE2b-256 c5ae1f9d011e58361bfe32a122a9b8bc2ec55b27d5137e357485ebe976699a18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page