Preserving any git repository to the Internet Archive
Project description
iagitbetter
iagitbetter is a python tool for preserving any git repository to the Internet Archive, An improved version of iagitup with support for all git providers, it downloads the complete repository, creates git bundles, uploads all files preserving structure, and archives to archive.org
- This project is heavily based off iagitup by Giovanni Damiola, credits to them (also credits to tubeup by Bibliotheca Anonoma for taking some stuff and modifying them)
Features
- Works with all git providers (GitHub, GitLab, Gitea, etc)
- Archive all repositories from a user or organization with options
- Self-hosted git instance support (GitLab, Gitea, Forgejo, Gogs, Gerrit, etc)
- Downloads and uploads the entire repository file structure
- Preserves provider directories like
.github/,.gitlab/,.gitea/ - Download repository releases with assets from supported providers
- Clone and archive all branches of a repository with proper directory structure
- Automatically fetches repository metadata from git provider APIs (when available)
- API token authentication for repositories
- Includes stars, forks, programming language, license, topics, and more metadata
- Creates git bundles
- Archives repository wikis into git bundles
- Archives Git Large File Storage (LFS) objects automatically
- Uses the first commit date as the repo creation date
- Pass additional metadata using
--metadata=<key:value>
Installation
Requires Python 3.9 or newer
pip install iagitbetter
The package makes a console script named iagitbetter once installed. You can also install from the source via cloning the repo and running pip install .
Configuration
ia configure
You'll be prompted to enter your Internet Archive account's email and password.
Usage
iagitbetter <url> [options]
Basic Arguments
<url>– Git repository URL or user/organization profile URL to archive
Options
--metadata=<key:value>– custom metadata to add to the IA item--all-files– upload all repository files in addition to git bundle (by default, only the bundle is uploaded)--include-wiki– clone and archive the repository wiki (if it exists)--quiet/-q– suppress verbose output--version– show version information--no-update-check– skip checking for updates--no-info-file– skip creating the repository info JSON file--no-repo-info– skip adding repository information to the Internet Archive item description
Release Options
--releases [N]– download releases from the repository (GitHub, GitLab, Codeberg, Gitea). Optionally specify number of releases to download (e.g.,--releases 5for 5 most recent releases)--all-releases– download all releases (default: latest release only)--latest-release– download only the latest release (default when--releasesis used)
Branch Options
--all-branches– clone and archive all branches of the repository--branch <name>– clone and archive a specific branch of the repository
User/Org Archiving Options
--skip-forks– skip forked repositories when archiving profiles--skip-archived– skip archived repositories when archiving profiles--skip-private– skip private repositories when archiving profiles--max-repos <number>– maximum number of repositories to archive from a profile
Self-Hosted Instance Options
--git-provider-type {github,gitlab,gitea,bitbucket,gitee,gogs,sourceforge,gerrit,launchpad,gist}– specify the git provider type for self-hosted instances--api-url <url>– custom API URL for self-hosted instances (e.g.,https://git.example.com/api/v1)--api-token <token>– API token for authentication with repositories--api-username <username>– username for Bitbucket App Passwords (used with--api-tokenfor basic auth)
Supported Git Providers
See supportedproviders.md for detailed information about each provider
Automatic Metadata Collection
For supported providers, iagitbetter automatically fetches:
- Repository description
- Star count, fork count, watcher count
- Primary programming language
- License information
- Topics/tags
- Creation and last update dates
- Default branch name
- Repository size and statistics
- Homepage URL
- Issue and wiki availability
- User/organization avatar
Release Support
For providers that support releases (GitHub, GitLab, Gitea, etc), iagitbetter can:
- Download the latest release or all releases
- Include release assets and attachments
- Download source code archives (zip/tar.gz)
- Save release metadata and descriptions
- Organized releases in a
{owner}-{repo}_releases/folder
Examples
Basic Repository Archiving
# Archive GitHub repository
iagitbetter https://github.com/user/repository
# Archive GitLab repository
iagitbetter https://gitlab.com/user/repository
# Archive BitBucket repository
iagitbetter https://bitbucket.org/user/repository
# Archive from any git provider
iagitbetter https://git.example.com/user/repository.git
# Archive from Gitee
iagitbetter https://gitee.com/user/repository
# Archive from Gogs instance
iagitbetter --git-provider-type gogs https://gogs.example.com/user/repository
# Archive from SourceForge
iagitbetter https://sourceforge.net/p/project/dog/
# Archive from GitHub Gist
iagitbetter https://gist.github.com/username/gist_id
User/Org Archiving
Archive all repositories from a user or organization profile:
# Archive all public repositories from a GitHub user
iagitbetter https://github.com/torvalds
# Archive all repositories from a GitLab organization
iagitbetter https://gitlab.com/gitlab-org
# Archive from Codeberg user
iagitbetter https://codeberg.org/username
# Archive from Gitea user
iagitbetter https://gitea.com/username
# Archive from Bitbucket workspace
iagitbetter https://bitbucket.org/atlassian
User/Org Archiving with Filters
# Skip forked repositories
iagitbetter https://github.com/username --skip-forks
# Skip archived repositories
iagitbetter https://github.com/username --skip-archived
# Skip private repositories
iagitbetter https://github.com/username --api-token TOKEN --skip-private
# Combine multiple filters
iagitbetter https://github.com/username --skip-forks --skip-archived
# Limit number of repositories to archive
iagitbetter https://github.com/username --max-repos 10
# Archive first 5 non-fork repositories
iagitbetter https://github.com/username --skip-forks --max-repos 5
User/Org Archiving with Additional Features
# Archive all repos with their releases
iagitbetter https://github.com/username --releases --all-releases
# Archive all repos with all branches
iagitbetter https://github.com/username --all-branches
# Combine profile archiving with multiple features
iagitbetter https://github.com/username --skip-forks --releases --all-branches
# Quiet mode for profile archiving
iagitbetter https://github.com/username --skip-forks --quiet
Self-Hosted User/Org Archiving
# Archive all repos from self-hosted GitLab user
iagitbetter https://gitlab.example.com/username \
--git-provider-type gitlab \
--api-token glpat-xxxxxxxxxxxxx
# Archive all repos from self-hosted Gitea organization
iagitbetter https://git.example.com/organization \
--git-provider-type gitea \
--api-token your_token_here
# Self-hosted with filters
iagitbetter https://gitlab.example.com/team \
--git-provider-type gitlab \
--api-token TOKEN \
--skip-forks \
--skip-archived \
--max-repos 20
Self-Hosted Repositories
# Self-hosted GitLab (auto-detection)
iagitbetter https://gitlab.example.com/user/repository
# Self-hosted GitLab with API configuration
iagitbetter --git-provider-type gitlab \
--api-url https://gitlab.example.com/api/v4 \
https://gitlab.example.com/user/repository
# Self-hosted Gitea/Forgejo with authentication
iagitbetter --git-provider-type gitea \
--api-token your_token_here \
https://git.example.com/user/repository
# Private repository on self-hosted instance
iagitbetter --git-provider-type gitlab \
--api-url https://gitlab.example.com/api/v4 \
--api-token glpat-xxxxxxxxxxxxx \
https://gitlab.example.com/user/private-repo
Release Archiving
# Archive repository with latest release
iagitbetter --releases https://github.com/user/repo
# Archive repository with specific number of releases (e.g., 5 most recent)
iagitbetter --releases 5 https://github.com/user/repo
# Archive repository with specific number of releases (e.g., 10 most recent)
iagitbetter --releases 10 https://github.com/user/repo
# Archive repository with all releases
iagitbetter --releases --all-releases https://github.com/user/repo
# Explicitly specify latest release only
iagitbetter --releases --latest-release https://github.com/user/repo
Branch Archiving
# Archive all branches of a repository
iagitbetter --all-branches https://github.com/user/repo
# Archive a specific branch
iagitbetter --branch test https://github.com/user/repo
# Archive all branches AND all releases
iagitbetter --all-branches --releases --all-releases https://github.com/user/repo
Advanced Usage
# Archive with custom metadata
iagitbetter --metadata="collection:software,topic:python" https://github.com/user/repo
# All files mode (upload repository files and bundle)
iagitbetter --all-files https://github.com/user/repo
# Archive repository with wiki
iagitbetter --include-wiki https://github.com/user/repo
# Quiet mode with all features
iagitbetter --quiet --all-branches --releases --all-releases --include-wiki https://github.com/user/repo
# Self-hosted with all features
iagitbetter --git-provider-type gitlab \
--api-token glpat-xxxxxxxxxxxxx \
--all-branches \
--releases --all-releases \
--include-wiki \
https://gitlab.example.com/user/repo
Profile Archiving Details
When you provide a user or organization profile URL (e.g., https://github.com/username), iagitbetter will:
- Automatically recognize the URL as a user/org rather than a repository
- Query the git provider's API to get all repositories for that user/org
- Filter repositories based on the options (
--skip-forks,--skip-archived, etc) - Archive each repository of user/org individually
- Provide a summary of what was archived and if there was any failures
Profile Archiving Output
The tool provides detailed progress information:
PROFILE ARCHIVING MODE
Username/Organization: torvalds
Git Provider: github
Fetching repositories from profile...
Found 25 repositories for torvalds
Filtered out 5 forked repositories
Will archive 20 repositories
Repository 1/20: torvalds/linux
Repository: torvalds/linux
Git Provider: github
Will archive: Repository files, Default branch
...
Successfully archived: torvalds/linux
URL: https://archive.org/details/torvalds-linux-20671005120000
PROFILE ARCHIVING SUMMARY
Username/Organization: torvalds
Total repositories found: 25
Repositories archived: 20
Successful: 20
Failed: 0
Successfully archived repositories:
torvalds/linux
https://archive.org/details/torvalds-linux-20241005120000
torvalds/subsurface
https://archive.org/details/torvalds-subsurface-20241005120100
...
Repository Structure Preservation
When using the --all-files flag, iagitbetter preserves the complete repository structure when uploading to Internet Archive. For example, if your repository contains:
README.md
.github/
└── workflows/
└── lint.yml
src/
├── main.py
└── utils/
└── helper.py
docs/
└── guide.md
tests/
└── test_main.py
The archive will contain all files exactly as shown, including the .github/ directory with workflows
With --all-branches
When using --all-branches, the structure becomes:
README.md
.github/workflows/lint.yml
src/main.py
src/utils/helper.py
docs/guide.md
tests/test_main.py
{repo-name}-{owner}_branches/
└── develop/
├── README.md
├── .github/workflows/ci.yml
├── src/main.py
└── ...
└── feature/
├── README.md
├── src/main.py
└── ...
{owner}-{repo}.bundle
With --releases
When using --releases, a releases directory is added:
README.md
.github/workflows/ci.yml
src/main.py
docs/guide.md
{owner}-{repo}_releases/
└── v1.0.0/
├── v1.0.0.release_info.json
├── v1.0.0.source.zip
└── v1.0.0.source.tar.gz
{owner}-{repo}.bundle
By default, only the git bundle is uploaded to Internet Archive.
If you use the --all-files flag, all repository files will be uploaded in addition to the bundle, preserving the directory structure as shown above.
How it works
Repository Analysis
iagitbetterparses the git URL to identify the provider and repository details- For self-hosted instances, it detects or uses the specified provider type
- It attempts to fetch additional metadata from the provider's API (if supported)
- Repository information is extracted including owner, name, and provider details
Profile Analysis (Profile Archiving Mode)
- Detects profile URL format (username/org)
- Queries the git provider's API to fetch all repositories
- Applies filters based on command-line options
- Archives each repository individually
- Generates summary report
Repository Download
- The git repository is cloned to a temporary directory using GitPython
- If
--all-branchesis specified, all remote branches are fetched and separate directories are created for each non-default branch - The first commit date is extracted for the creation date
- A git bundle is created with all branches and tags
- If Git LFS is detected, LFS objects are automatically fetched and archived into a tarball
- User/organization avatar is downloaded if available
Branch Processing (when --all-branches is used)
- All remote branches are fetched from the repository
- For each non-default branch, a separate directory named
{repo-name}-{owner}_branches/{branch-name}is created - Each branch is checked out and its files are copied to the respective branch directory
- The default branch files remain in the root directory
- This creates a clear separation of branches in the archive
Release Processing (when --releases is used)
- Release information is fetched from the provider's API
- Latest release or all releases are downloaded based on options
- Source code archives (zip/tar.gz) are downloaded
- Release assets and attachments are downloaded
- Release metadata is saved as JSON files
- All content is organized in a
{owner}-{repo}_releases/directory structure
Internet Archive Upload
- Comprehensive metadata is prepared including:
- title:
{owner} - {repo} - identifier:
{owner}-{repo}-{timestamp} - Original repository URL and git provider information
- First commit date as the creation date
- API-fetched metadata (stars, forks, language, etc)
- Branch, releases, and wiki information
- title:
- All repository files are uploaded preserving directory structure
- Provider directories like
.github/,.gitlab/,.gitea/are preserved - Branches are included (if archived with
--all-branches) - Release files are included (if requested)
- The git bundle is included
- Wiki bundle is included (if archived with
--include-wiki) - User/organization avatar is included
- README.md is converted to HTML for the item description
Archive Format
- Identifier:
{owner}-{repo}-{timestamp} - Title:
{owner} - {repo} - Date: First commit date
- Files: Complete repository structure, branches (if requested), releases (if requested), wiki bundle (if requested), and git bundle
Repository Restoration
To restore a repository from the archive:
# Download the git bundle
wget https://archive.org/download/{identifier}/{owner}-{repo}.bundle
# Clone from the bundle (includes all branches if archived with --all-branches)
git clone {owner}-{repo}.bundle {repo-name}
# Or restore using git
git clone {owner}-{repo}.bundle
cd {repo-name}
# List all available branches (if --all-branches was used)
git branch -a
# Check out a specific branch
git checkout branch-name
Release Information
When releases are archived, they can be found in the {owner}-{repo}_releases/ directory of the archive, Each release includes:
{version}.release_info.json- Complete release metadata{version}.source.zip- Source code archive{version}.source.tar.gz- Source code tarball- binaries
Key Improvements over iagitup
- Works with any git provider (public and self-hosted)
- Archive all repositories from a user or org
- Self-hosted git instance support with authentication
- Uploads the entire repository file structure
- Preserves provider directories (
.github/,.gitlab/,.gitea/) - Can archive all branches of a repository
- Automatically fetches repository information from APIs
- Downloads user/organization avatars
- Uses first commit date for historical accuracy
- Leverages git provider APIs for comprehensive metadata
Requirements
- Python 3.9+
- Git
- Internet Archive account and credentials
- Required dependencies in the
requirements.txtfile
Troubleshooting
Authentication Issues
- Ensure your API token has the correct permissions
- For self-hosted instances, verify the API URL is correct
- Check that the token hasn't expired
API Metadata Fetching
- If metadata isn't fetched, the repository will still be archived
- Use
--git-provider-typeto help with provider detection - Some self-hosted instances may have APIs disabled
Private Repositories
- Always use
--api-tokenfor private repositories - Ensure the token has read access to the repository
- For self-hosted instances, you may need both
--api-urland--api-token
Profile Archiving Issues
- Rate Limiting: Public APIs have rate limits (use
--api-tokento increase limits) - Large Profiles: Use
--max-reposto limit the number of repositories - Failed Repositories: Individual repository failures won't stop the entire process
- Time Consumption: Archiving many repositories takes significant time
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iagitbetter-1.1.5.tar.gz.
File metadata
- Download URL: iagitbetter-1.1.5.tar.gz
- Upload date:
- Size: 94.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
699590a44821eab11cb7850d5b9d00f80833311a174278711249971777db261c
|
|
| MD5 |
1d13838d9e7d1dca895070e2d4d7f1fa
|
|
| BLAKE2b-256 |
8b60406bdad914cd4c3a5df4a97a74b91ed5487c14206149aca627c7976ecce2
|
Provenance
The following attestation bundles were made for iagitbetter-1.1.5.tar.gz:
Publisher:
release.yml on Andres9890/iagitbetter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
iagitbetter-1.1.5.tar.gz -
Subject digest:
699590a44821eab11cb7850d5b9d00f80833311a174278711249971777db261c - Sigstore transparency entry: 1112683478
- Sigstore integration time:
-
Permalink:
Andres9890/iagitbetter@53bb7752254eb6f5ab64e8fc5d425d91a86a2cef -
Branch / Tag:
refs/tags/v1.1.5 - Owner: https://github.com/Andres9890
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@53bb7752254eb6f5ab64e8fc5d425d91a86a2cef -
Trigger Event:
release
-
Statement type:
File details
Details for the file iagitbetter-1.1.5-py3-none-any.whl.
File metadata
- Download URL: iagitbetter-1.1.5-py3-none-any.whl
- Upload date:
- Size: 90.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cbe24ce73e139e3f5cd34f31bdbae44556a7d7c80e12476f5fc464b6a686584
|
|
| MD5 |
2f0162bb83b6df20f019f9e997e01ba9
|
|
| BLAKE2b-256 |
a6cce59f87a44eaeb83f9042da5116c5c455bb4653480185a0fb9ca19ebd8e93
|
Provenance
The following attestation bundles were made for iagitbetter-1.1.5-py3-none-any.whl:
Publisher:
release.yml on Andres9890/iagitbetter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
iagitbetter-1.1.5-py3-none-any.whl -
Subject digest:
9cbe24ce73e139e3f5cd34f31bdbae44556a7d7c80e12476f5fc464b6a686584 - Sigstore transparency entry: 1112683569
- Sigstore integration time:
-
Permalink:
Andres9890/iagitbetter@53bb7752254eb6f5ab64e8fc5d425d91a86a2cef -
Branch / Tag:
refs/tags/v1.1.5 - Owner: https://github.com/Andres9890
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@53bb7752254eb6f5ab64e8fc5d425d91a86a2cef -
Trigger Event:
release
-
Statement type: