Save GitHub package dependents data to a SQLite database by scraping the dependency graph with support for specific package selection
Project description
github-dependents-to-sqlite
Save GitHub dependents data to a SQLite database by scraping the GitHub dependency graph.
Features
This tool scrapes the GitHub dependency graph to find repositories that depend on a specific repository and saves this data to a SQLite database.
Installation
Requires Python 3.8 or higher.
$ pip install github-dependents-to-sqlite
Authentication
Create a GitHub personal access token: https://github.com/settings/tokens
Run this command to setup authentication:
$ github-dependents-to-sqlite auth
Or for local development:
$ python -m src.cli auth
This will create a file called auth.json in your current directory containing the required value. To save the file at a different path or filename, use the -a/--auth=myauth.json option.
As an alternative to using an auth.json file you can add your access token to an environment variable called GITHUB_TOKEN.
Basic Usage
The GitHub dependency graph can show other GitHub projects that depend on a specific repo, for example rust-lang/rust.
This data is not yet available through the GitHub API. This tool scrapes those pages and uses the GitHub API to load full versions of the dependent repositories.
Commands
# Setup authentication (first time)
$ github-dependents-to-sqlite auth
# Scrape dependents
$ github-dependents-to-sqlite scrape github.db owner/repo
# Multiple repositories
$ github-dependents-to-sqlite scrape github.db owner/repo1 owner/repo2
Local Development (without install)
# Setup auth
$ python -m src.cli auth
# Scrape dependents
$ python -m src.cli scrape github.db owner/repo -v
Package Selection
Many repositories have multiple packages. The tool will automatically detect them and offer choices:
Interactive Mode (default):
$ github-dependents-to-sqlite scrape github.db rust-lang/rust
You'll see a menu like:
📦 Processing repository: rust-lang/rust
Found 12 package(s)
Available packages:
1. proc_macro
2. rustc-std-workspace-core
3. core
...
13. All packages (scrape each one)
14. Skip package selection (may find fewer dependents)
Select a package [13]: 3
Selected: core
Total dependents: 15,420
Scraping dependents: 100%|████████████| 15420/15420 [12:15<00:00, 20.98repo/s]
✅ Found 15,420 new dependent(s)
🎉 Done!
Command-line Mode (use -p to specify package):
# By package name
$ github-dependents-to-sqlite scrape github.db rust-lang/rust -p "core"
# By package ID
$ github-dependents-to-sqlite scrape github.db rust-lang/rust -p "UGFja2FnZS0yNzE5MzQwNjQ1"
Options
-p, --package TEXT: Specify package name or ID (skips interactive selection)-v, --verbose: Verbose output with detailed progress information-a, --auth PATH: Path to auth.json file (default: auth.json)
Database Schema
The tool creates the following tables:
repos: Repository information for both the target repo and its dependentsusers: User/organization information for repository ownerslicenses: License information for repositoriesdependents: Junction table linking repositories to their dependents
The tool also creates:
- Full-text search indices on relevant columns
- Foreign key relationships between tables
- A
dependent_reposview for easy querying
Example Query
After scraping, you can query the database to find all dependents:
SELECT * FROM dependent_repos ORDER BY dependent_stars DESC;
Development
To contribute to this project:
- Clone the repository
- Install development dependencies:
pip install -e ".[test]" - Run tests:
pytest
Acknowledgments
This project is based on github-to-sqlite by Simon Willison. The original project focused on saving GitHub API data to SQLite. This fork extends that concept to specifically handle package dependency graph scraping, allowing you to discover which repositories depend on specific packages.
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file github_dependents_to_sqlite-0.1.1.tar.gz.
File metadata
- Download URL: github_dependents_to_sqlite-0.1.1.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
300f3a25e044b082d1bcc49048b3c8473fe9abdf873a07256116ff96304b9db0
|
|
| MD5 |
7e7f450944dc9909cde1d2ce8d7164ba
|
|
| BLAKE2b-256 |
fa11f645ca9b4e161ab52c0beb57ddd6a65df3ee3ba7eab3c16332e7bd6a5e86
|
File details
Details for the file github_dependents_to_sqlite-0.1.1-py3-none-any.whl.
File metadata
- Download URL: github_dependents_to_sqlite-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc6d5f4e4305a2b8303e959c050d349766f2df4cd171a9f7b93718d883932f11
|
|
| MD5 |
26d2ff1d398a83bd1310dec617fc9e97
|
|
| BLAKE2b-256 |
f6254932698c4060b731cbd285cce6f7087b7a9a57c6430c48caa407da244631
|