Skip to main content

Save GitHub package dependents data to a SQLite database by scraping the dependency graph with support for specific package selection

Project description

github-dependents-to-sqlite

PyPI Changelog License

Save GitHub dependents data to a SQLite database by scraping the GitHub dependency graph.

Features

This tool scrapes the GitHub dependency graph to find repositories that depend on a specific repository and saves this data to a SQLite database.

Installation

Requires Python 3.8 or higher.

$ pip install github-dependents-to-sqlite

Authentication

Create a GitHub personal access token: https://github.com/settings/tokens

Run this command to setup authentication:

$ github-dependents-to-sqlite auth

Or for local development:

$ python -m src.cli auth

This will create a file called auth.json in your current directory containing the required value. To save the file at a different path or filename, use the -a/--auth=myauth.json option.

As an alternative to using an auth.json file you can add your access token to an environment variable called GITHUB_TOKEN.

Basic Usage

The GitHub dependency graph can show other GitHub projects that depend on a specific repo, for example rust-lang/rust.

This data is not yet available through the GitHub API. This tool scrapes those pages and uses the GitHub API to load full versions of the dependent repositories.

Commands

# Setup authentication (first time)
$ github-dependents-to-sqlite auth

# Scrape dependents
$ github-dependents-to-sqlite scrape github.db owner/repo

# Multiple repositories
$ github-dependents-to-sqlite scrape github.db owner/repo1 owner/repo2

Local Development (without install)

# Setup auth
$ python -m src.cli auth

# Scrape dependents
$ python -m src.cli scrape github.db owner/repo -v

Package Selection

Many repositories have multiple packages. The tool will automatically detect them and offer choices:

Interactive Mode (default):

$ github-dependents-to-sqlite scrape github.db rust-lang/rust

You'll see a menu like:

📦 Processing repository: rust-lang/rust
Found 12 package(s)

Available packages:
  1. proc_macro
  2. rustc-std-workspace-core
  3. core
  ...
  13. All packages (scrape each one)
  14. Skip package selection (may find fewer dependents)

Select a package [13]: 3
Selected: core
Total dependents: 15,420
Scraping dependents: 100%|████████████| 15420/15420 [12:15<00:00, 20.98repo/s]
✅ Found 15,420 new dependent(s)
🎉 Done!

Command-line Mode (use -p to specify package):

# By package name
$ github-dependents-to-sqlite scrape github.db rust-lang/rust -p "core"

# By package ID
$ github-dependents-to-sqlite scrape github.db rust-lang/rust -p "UGFja2FnZS0yNzE5MzQwNjQ1"

Options

  • -p, --package TEXT: Specify package name or ID (skips interactive selection)
  • -v, --verbose: Verbose output with detailed progress information
  • -a, --auth PATH: Path to auth.json file (default: auth.json)

Database Schema

The tool creates the following tables:

  • repos: Repository information for both the target repo and its dependents
  • users: User/organization information for repository owners
  • licenses: License information for repositories
  • dependents: Junction table linking repositories to their dependents

The tool also creates:

  • Full-text search indices on relevant columns
  • Foreign key relationships between tables
  • A dependent_repos view for easy querying

Example Query

After scraping, you can query the database to find all dependents:

SELECT * FROM dependent_repos ORDER BY dependent_stars DESC;

Development

To contribute to this project:

  1. Clone the repository
  2. Install development dependencies: pip install -e ".[test]"
  3. Run tests: pytest

Acknowledgments

This project is based on github-to-sqlite by Simon Willison. The original project focused on saving GitHub API data to SQLite. This fork extends that concept to specifically handle package dependency graph scraping, allowing you to discover which repositories depend on specific packages.

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

github_dependents_to_sqlite-0.1.1.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

github_dependents_to_sqlite-0.1.1-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file github_dependents_to_sqlite-0.1.1.tar.gz.

File metadata

File hashes

Hashes for github_dependents_to_sqlite-0.1.1.tar.gz
Algorithm Hash digest
SHA256 300f3a25e044b082d1bcc49048b3c8473fe9abdf873a07256116ff96304b9db0
MD5 7e7f450944dc9909cde1d2ce8d7164ba
BLAKE2b-256 fa11f645ca9b4e161ab52c0beb57ddd6a65df3ee3ba7eab3c16332e7bd6a5e86

See more details on using hashes here.

File details

Details for the file github_dependents_to_sqlite-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for github_dependents_to_sqlite-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fc6d5f4e4305a2b8303e959c050d349766f2df4cd171a9f7b93718d883932f11
MD5 26d2ff1d398a83bd1310dec617fc9e97
BLAKE2b-256 f6254932698c4060b731cbd285cce6f7087b7a9a57c6430c48caa407da244631

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page