Skip to main content

YAML-driven GitHub repository synchronization tool

Project description

GitHub Repository Downloader

A powerful Python tool to download and keep synchronized all repositories and branches from GitHub organizations and user accounts, driven by YAML configuration for advanced declarative management.

Overview

This tool provides:

  • A YAML-driven CLI for advanced, scriptable workflows
  • An interactive mode that guides you through export/apply flows
  • A status reporting system to track local vs remote state

Use YAML to describe what repositories and branches you want locally, and let the tool do the rest.

Features

Core Features

  • 🔍 Automatically discovers all repositories from GitHub organizations and user accounts
  • 👤 Auto-detects whether an account is an organization or individual user
  • 🌿 Downloads all branches from each repository
  • 🔄 Automatically refreshes already cloned branches with latest changes
  • 🎯 Interactive CLI with filtering options
  • 📊 Progress tracking and detailed summary report
  • ❌ Comprehensive failure tracking and reporting
  • 🔑 Supports GitHub Personal Access Token for private repos and rate limiting
  • ⚡ Shallow cloning for faster downloads
  • 📁 Organized output structure

New YAML-Driven CLI

  • 📝 Declarative YAML configuration - Define what you want, not how to get it
  • 🏢 Multi-organization support - Manage multiple orgs in a single config
  • 🎨 Custom output layouts - Control exactly where repos are cloned
  • 📊 YAML status reports - Human and machine-readable status tracking
  • 🔧 Per-branch sync modes - CHECK_ONLY, PULL_CHANGES, PULL_AND_RESET
  • 🚫 Downstream-only - No accidental pushes or remote modifications
  • 🔄 Config reusability - Version control and share your sync configurations

Installation

  1. Install Python 3.8 or higher (if not already installed)

  2. Install via pip (recommended when available):

    pip3 install git-repo-sync
    # or
    pip install git-repo-sync
    
  3. Install from source (clone this repo):

    # from the repo root
    pip install -r requirements.txt
    pip install .
    

After installation, the git-repo-sync CLI should be available on your PATH.

Usage - YAML-Driven CLI

CLI Command Structure

graph LR
    A[Git Repo Sync] --> B[Interactive Mode]
    A --> C[CLI Commands]
    
    B --> B1[git-repo-sync interactive]
    B1 --> B2[Menu-driven interface]
    
    C --> C1[export-config]
    C --> C2[apply-config]
    C --> C3[status]
    C --> C4[cleanup]
    C --> C5[diff-config]
    
    C1 --> C1A[Fetch from GitHub]
    C1A --> C1B[Generate YAML]
    
    C2 --> C2A[--mode sync]
    C2 --> C2B[--mode status]
    C2 --> C2C[--mode cleanup]
    
    C2A --> C2D[Clone/Pull repos]
    C2B --> C2E[Check status only]
    C2C --> C2F[Remove unmapped]
    
    C3 --> C2B
    C4 --> C2C
    C5 --> C5A[Compare Local vs Remote]
    C5A --> C5B[Generate Diff YAML]
    
    style A fill:#e1f5ff
    style B fill:#fff4e6
    style C fill:#fff4e6
    style C1B fill:#c8e6c9
    style C2D fill:#c8e6c9
    style C2E fill:#ffe0b2
    style C2F fill:#ffccbc
    style C5B fill:#e1bee7

Quick Reference

Task Command Token Needed?
Start with a guided menu git-repo-sync interactive No
Export repos from GitHub to YAML git-repo-sync export-config --org <name> --working-dir <dir> --output <file.yaml> [--token $TOKEN] For private repos
Sync repos from YAML config git-repo-sync apply-config --config <file.yaml> --mode sync [--token $TOKEN] For private repos
Check status without changes git-repo-sync status --config <file.yaml> [--token $TOKEN] For private repos
Preview sync without changes git-repo-sync apply-config --config <file.yaml> --mode sync --dry-run [--token $TOKEN] For private repos
Check for new repos/branches git-repo-sync diff-config --config <file.yaml> [--token $TOKEN] For private repos
Clean up extra directories git-repo-sync cleanup --config <file.yaml> --dry-run Usually no

Note: Cleanup is currently experimental and does not yet delete directories; see the cleanup section below for details.

Interactive Mode

The easiest way to get started with the new CLI:

git-repo-sync interactive

This provides a menu-driven interface with two main options:

  1. Export/Update configuration from GitHub
    • Generate a fresh configuration (export from scratch)
    • Generate updates for an existing configuration (diff mode)
  2. Apply existing configuration
    • Status check, sync, or cleanup operations

Command-Line Interface

For scripting and automation, use the CLI directly. The tool provides five main commands:


📤 export-config - Generate YAML Configuration from GitHub

Creates a YAML configuration file by discovering all repositories and branches from GitHub organization(s) or user account(s).

Syntax:

git-repo-sync export-config \
  --org <org_or_username> \
  --working-dir <base_directory> \
  --output <config_file.yaml> \
  [--token <github_token>] \
  [--status] \
  [--status-output <status_file>]

Flags:

Flag Alias Required Description
--org -o Yes GitHub organization or username to export from. Can be specified multiple times to export from multiple accounts in one config file.
--working-dir -w Yes Base directory where repositories will be cloned. Can be relative (e.g., ./_github) or absolute (e.g., /home/user/repos).
--output -c Yes Path where the YAML configuration file will be saved (e.g., myorg-config.yaml).
--token -t No GitHub Personal Access Token. Required for private repos and to avoid rate limits. See Creating a Token.
--status - No Generate a status report immediately after export showing what would need to be synced.
--status-output - No Custom path for the status report file (default: <working-dir>/status.txt).

Examples:

# Export single organization (public repos only)
git-repo-sync export-config \
  --org facebook \
  --working-dir ./_github \
  --output facebook-config.yaml

# Export with authentication for private repos + status report
git-repo-sync export-config \
  --org mycompany \
  --working-dir /home/user/work/repos \
  --output mycompany.yaml \
  --token ghp_xxxxxxxxxxxx \
  --status

# Export from multiple organizations into one config file
git-repo-sync export-config \
  --org organization1 \
  --org organization2 \
  --org personal-username \
  --working-dir ./_all_repos \
  --output multi-org-config.yaml \
  --token ghp_xxxxxxxxxxxx

# Export from user account (auto-detects user vs org)
git-repo-sync export-config \
  --org torvalds \
  --working-dir ./linux-repos \
  --output torvalds-repos.yaml

Important Note on Private Repositories:

  • For your own user account: When authenticated with a token, the tool will fetch all your repositories (public + private)
  • For other users: Only public repositories are accessible (GitHub API limitation)
  • For organizations: All repositories you have access to are fetched (public + private, if you're a member)

🔍 diff-config - Check for New Items

Compares your existing local configuration against the current state on GitHub to find new repositories or branches that are missing from your config.

Syntax:

git-repo-sync diff-config \
  --config <existing_config.yaml> \
  --output <diff_file.yaml> \
  [--token <github_token>]

Flags:

Flag Alias Required Description
--config -c Yes Path to your existing YAML configuration file.
--output -o No Path where the diff report will be saved (default: config_diff.yaml).
--token -t No GitHub Personal Access Token.

Examples:

# Check for new repos/branches
git-repo-sync diff-config --config myorg.yaml

# Save diff to custom path
git-repo-sync diff-config --config myorg.yaml --output updates.yaml

Output: The command generates a partial YAML file containing only the missing repositories and branches. You can copy-paste sections from this file directly into your main configuration.


🔄 apply-config - Apply YAML Configuration

Performs sync, status checking, or cleanup operations based on your YAML configuration file.

Syntax:

git-repo-sync apply-config \
  --config <config_file.yaml> \
  --mode <status|sync|cleanup> \
  [--token <github_token>] \
  [--dry-run] \
  [--remove-unmapped] \
  [--yes]

Flags:

Flag Alias Required Description
--config -c Yes Path to your YAML configuration file.
--mode - Yes Operation mode: status (check only), sync (clone/pull repos), or cleanup (remove unmapped dirs).
--token -t No GitHub Personal Access Token. Required for private repositories. Not needed for public repos or if using git credential storage.
--dry-run - No Preview what would happen without making any changes. Highly recommended for first run.
--remove-unmapped - No (With cleanup mode) Actually remove directories not in the config. Without this flag, cleanup only reports what would be removed.
--yes -y No Skip confirmation prompts for destructive operations. Use with caution.

Examples:

# Safe first run - see what would be synced without making changes
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode sync \
  --dry-run

# Actually perform the sync (public repos)
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode sync

# Sync with authentication for private repos
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode sync \
  --token $GITHUB_TOKEN

# Check status of all repos without making changes
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode status

# Check status with authentication (for private repos)
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode status \
  --token $GITHUB_TOKEN

# Preview what directories would be cleaned up
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode cleanup \
  --dry-run

# Actually remove unmapped directories (be careful!)
git-repo-sync apply-config \
  --config myorg-config.yaml \
  --mode cleanup \
  --remove-unmapped \
  --yes

📊 status - Quick Status Check (Shortcut Command)

Convenience command that's equivalent to apply-config --mode status.

Syntax:

git-repo-sync status --config <config_file.yaml> [--token <github_token>]

Flags:

Flag Alias Required Description
--config -c Yes Path to your YAML configuration file.
--token -t No GitHub Personal Access Token. Required for private repositories.

Examples:

# Check status of all configured repositories (public repos)
git-repo-sync status --config myorg-config.yaml

# Check status with authentication (for private repos)
git-repo-sync status --config myorg-config.yaml --token $GITHUB_TOKEN

# This is exactly the same as:
git-repo-sync apply-config --config myorg-config.yaml --mode status --token $GITHUB_TOKEN

🧹 cleanup - Remove Unmapped Directories (Shortcut Command)

Convenience command that's equivalent to apply-config --mode cleanup.

Current status: Cleanup mode is not yet fully implemented in the CLI. It does not delete directories, even when --remove-unmapped is passed, and should currently be treated as an experimental/status-style command rather than a destructive cleanup.

Syntax:

git-repo-sync cleanup \
  --config <config_file.yaml> \
  [--token <github_token>] \
  [--dry-run] \
  [--remove-unmapped] \
  [--yes]

Flags:

Flag Alias Required Description
--config -c Yes Path to your YAML configuration file.
--token -t No GitHub Personal Access Token (only needed if checking remote info for private repos).
--dry-run - No Show what would be removed without actually deleting.
--remove-unmapped - No Intended to remove directories not in the config (currently a no-op; see note above).
--yes -y No Skip confirmation prompts.

Examples:

# See what would be cleaned up (safe)
git-repo-sync cleanup --config myorg-config.yaml --dry-run

# Remove unmapped directories
git-repo-sync cleanup --config myorg-config.yaml --remove-unmapped

# This is exactly the same as:
git-repo-sync apply-config --config myorg-config.yaml --mode cleanup --remove-unmapped


When Do You Need a Token?

The --token flag is required in different situations depending on the command:

Command Token Needed? Why?
export-config Yes (for private repos) Fetches repository list and branch information from GitHub API
apply-config --mode sync Yes (for private repos) Clones/pulls private repositories; public repos work without token if you use git credential storage
status Yes (for private repos) Fetches remote branch information to compare with local state
cleanup Usually No Only operates on local directories; token only needed if checking remote info

Alternative/authentication notes:

  • For API operations (like export-config and remote status checks), a token (via --token or GITHUB_TOKEN) is required to access private repositories.
  • For git clone/pull operations, authentication is handled by git itself (for example via git config credential.helper, SSH keys, or your OS keychain). The tool does not embed your token into clone URLs or write it to YAML/status files.

Creating a GitHub Token

To access private repositories or avoid API rate limits:

  1. Go to GitHub SettingsDeveloper settingsPersonal access tokensTokens (classic)
  2. Click Generate new token (classic)
  3. Select scopes:
    • repo (Full control of private repositories)
    • read:org (Read organization membership - if using organizations)
  4. Generate and copy the token (starts with ghp_)
  5. Store it securely - you won't be able to see it again

Once created, you can set it as the GITHUB_TOKEN environment variable (for example export GITHUB_TOKEN=ghp_xxxxxxxxxxxx). The CLI and interactive mode will automatically use this token when --token is not provided on the command line.

Using the token:

# Set as environment variable (recommended)
export GITHUB_TOKEN=ghp_xxxxxxxxxxxx
git-repo-sync export-config --org myorg --working-dir ./_github --output config.yaml --token $GITHUB_TOKEN

# Or pass directly (less secure, visible in shell history)
git-repo-sync export-config --org myorg --working-dir ./_github --output config.yaml --token ghp_xxxxxxxxxxxx

Common Workflows

Workflow 1: First-time setup for an organization

# Step 1: Export configuration from GitHub
git-repo-sync export-config \
  --org mycompany \
  --working-dir ./repos \
  --output mycompany.yaml \
  --token $GITHUB_TOKEN

# Step 2: Review and edit the generated YAML (optional)
vim mycompany.yaml  # Disable repos/branches you don't need

# Step 3: Preview what will be synced
git-repo-sync apply-config \
  --config mycompany.yaml \
  --mode sync \
  --dry-run \
  --token $GITHUB_TOKEN

# Step 4: Actually sync the repositories (with auth for private repos)
git-repo-sync apply-config \
  --config mycompany.yaml \
  --mode sync \
  --token $GITHUB_TOKEN

Workflow 2: Daily sync to update local repos

# Check status first (for private repos, add --token)
git-repo-sync status --config mycompany.yaml --token $GITHUB_TOKEN

# Just run sync - it will pull latest changes
git-repo-sync apply-config \
  --config mycompany.yaml \
  --mode sync \
  --token $GITHUB_TOKEN

Workflow 3: Handling new repositories (The Diff Workflow)

# Step 1: Check if there are new repos or branches on GitHub
git-repo-sync diff-config --config mycompany.yaml --output updates.yaml --token $GITHUB_TOKEN

# Step 2: Review updates.yaml
# It contains only the new items found on remote.

# Step 3: Copy the desired sections from updates.yaml into mycompany.yaml

# Step 4: Run sync to download the new items
git-repo-sync apply-config --config mycompany.yaml --mode sync --token $GITHUB_TOKEN

Workflow 4: Managing multiple organizations

# Export all organizations into one config file
git-repo-sync export-config \
  --org company1 \
  --org company2 \
  --org personal-account \
  --working-dir ./all-repos \
  --output combined.yaml \
  --token $GITHUB_TOKEN

# Sync everything at once (with auth for private repos)
git-repo-sync apply-config \
  --config combined.yaml \
  --mode sync \
  --token $GITHUB_TOKEN

Workflow 4: Safe cleanup of old repositories

# Step 1: See what would be removed
git-repo-sync cleanup --config mycompany.yaml --dry-run

# Step 2: Review the output carefully

# Step 3: Actually remove unmapped directories
git-repo-sync cleanup --config mycompany.yaml --remove-unmapped

Workflow 5: Using in CI/CD or automation scripts

#!/bin/bash
set -e  # Exit on error

# Check if repos are up to date
if ! git-repo-sync status --config production.yaml; then
  echo "⚠️  Repositories are out of sync!"
  exit 1
fi

echo "✅ All repositories are synchronized"

YAML Configuration

Example configuration file:

version: 1

global:
  working_directory: ./_github
  status_report_file: status/status.txt

organizations:
  - name: myorg
    type: org  # "org" for organizations, "user" for individual accounts
    base_output_dir: myorg
    repositories:
      - name: my-repo
        output_dir: null  # use default layout; set a path here to override
        enabled: true
        http_url: https://github.com/myorg/my-repo
        visibility: public
        about: "My repository"
        
        branches:
          - name: main
            enabled: true
            sync_mode: PULL_CHANGES
            comment: "Production branch"
          
          - name: develop
            enabled: true
            sync_mode: PULL_AND_RESET
            comment: "Mirror remote exactly"
          
          - name: old-feature
            enabled: false
            sync_mode: CHECK_ONLY
            comment: "Disabled, won't sync"
  
  - name: username
    type: user  # Individual GitHub user account
    base_output_dir: personal
    repositories:
      - name: dotfiles
        output_dir: null
        enabled: true
        http_url: https://github.com/username/dotfiles
        visibility: public
        about: "Personal dotfiles"
        
        branches:
          - name: main
            enabled: true
            sync_mode: PULL_CHANGES
            comment: "Personal configs"

Configuration Notes:

  • type: Specifies whether the account is an org (organization) or user (individual account).

    • Defaults to org if not specified (for backward compatibility).
    • Auto-detected when using export-config command.
  • output_dir: Optional subdirectory within the organization's base directory.

    • output_dir: null (or omitted): use default layout → working_directory / base_output_dir / repo_name
    • output_dir: "subdir" (relative): adds subdirectory → working_directory / base_output_dir / subdir / repo_name
    • output_dir: "path/to/dir" (relative, multi-level): → working_directory / base_output_dir / path / to / dir / repo_name
    • output_dir: "/absolute/path" (absolute): overrides all → /absolute/path / repo_name
    • Important: The repository name always appears as the final directory before branches.

Path Resolution Examples:

working_directory base_output_dir output_dir repo_name Final Path
/home/user/repos MR901 null xyz /home/user/repos/MR901/xyz/
/home/user/repos MR901 posts xyz /home/user/repos/MR901/posts/xyz/
/home/user/repos MR901 abc/pqr xyz /home/user/repos/MR901/abc/pqr/xyz/
/home/user/repos MR901 /absolute xyz /absolute/xyz/

Sync Modes:

  • CHECK_ONLY - Only compute status, no file changes
  • PULL_CHANGES - Safe pull from remote, no aggressive cleanup
  • PULL_AND_RESET - Mirror remote (reset + delete extra files)

Further Documentation

  • Getting started guide: docs/getting_started.md
  • CLI reference: docs/cli_reference.md
  • YAML configuration schema & status format: docs/YAML_SCHEMA.md
  • Product requirements / architecture: docs/PRD.rst
  • Examples: examples/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

git_repo_sync-0.0.2.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

git_repo_sync-0.0.2-py3-none-any.whl (35.0 kB view details)

Uploaded Python 3

File details

Details for the file git_repo_sync-0.0.2.tar.gz.

File metadata

  • Download URL: git_repo_sync-0.0.2.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for git_repo_sync-0.0.2.tar.gz
Algorithm Hash digest
SHA256 bca5f9b69fb75a2f176588ce25a656da54ee84e7876784578d79340d7e843589
MD5 4508d3f31eeb0d53c4a3ff51968dd49d
BLAKE2b-256 3f6b58da80b464624d9767f473d900a0c71341c4e3ec49af38c1b76868d1f495

See more details on using hashes here.

File details

Details for the file git_repo_sync-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: git_repo_sync-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 35.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for git_repo_sync-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 29241f313a64a2b3252d5d52c1b732caa521b1dd262ce448b54e3407d13ce48a
MD5 f140106ff5f957766727535c2fca0415
BLAKE2b-256 8fa045e6dcda4e32f08a7cc4650fc6e5a72e5137499d7dcaec07d20a5fea7c8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page