Skip to main content

Save data from GitHub to a SQLite database

Project description

github-to-sqlite

PyPI Changelog CircleCI License

Save data from GitHub to a SQLite database.

Demo

https://github-to-sqlite.dogsheep.net/ hosts a Datasette demo of a database created by running this tool against all of the repositories in the Dogsheep GitHub organization, plus the datasette and sqlite-utils repositories.

How to install

$ pip install github-to-sqlite

Authentication

Create a GitHub personal access token: https://github.com/settings/tokens

Run this command and paste in your new token:

$ github-to-sqlite auth

This will create a file called auth.json in your current directory containing the required value. To save the file at a different path or filename, use the --auth=myauth.json option.

Fetching issues for a repository

The issues command retrieves all of the issues belonging to a specified repository.

$ github-to-sqlite issues github.db simonw/datasette

If an auth.json file is present it will use the token from that file. It works without authentication for public repositories but you should be aware that GitHub have strict IP-based rate limits for unauthenticated requests.

You can point to a different location of auth.json using -a:

$ github-to-sqlite issues github.db simonw/datasette -a /path/to/auth.json

You can use the --issue option to only load just one specific issue:

$ github-to-sqlite issues github.db simonw/datasette --issue=1

Fetching issue comments for a repository

The issue-comments command retrieves all of the comments on all of the issues in a repository.

It is recommended you run issues first, so that each imported comment can have a foreign key poining to its issue.

$ github-to-sqlite issues github.db simonw/datasette
$ github-to-sqlite issue-comments github.db simonw/datasette

You can use the --issue option to only load comments for a specific issue within that repository, for example:

$ github-to-sqlite issue-comments github.db simonw/datasette --issue=1

Fetching commits for a repository

The commits command retrieves details of all of the commits for one or more repositories. It currently fetches the sha, commit message and author and committer details - it does no retrieve the full commit body.

$ github-to-sqlite commits github.db simonw/datasette simonw/sqlite-utils

The command accepts one or more repositories.

By default it will stop as soon as it sees a commit that has previously been retrieved. You can force it to retrieve all commits (including those that have been previously inserted) using --all.

Fetching contributors to a repository

The contributors command retrieves details of all of the contributors for one or more repositories.

$ github-to-sqlite contributors github.db simonw/datasette simonw/sqlite-utils

The command accepts one or more repositories. It populates a contributors table, with foreign keys to repos and users and a contributions table listing the number of commits to that repository for each contributor.

Fetching repos belonging to a user or organization

The repos command fetches repos belonging to a user or organization.

Without any other arguments, this command will fetch all repos that the currently authenticated user owns, collaborates on or can access via one of their organizations:

$ github-to-sqlite repos github.db

To fetch repos belonging to a specific user or organization, provide their username as an argument:

$ github-to-sqlite repos github.db dogsheep # organization
$ github-to-sqlite repos github.db simonw # user

You can pass more than one username to fetch for multiple users or organizations at once:

$ github-to-sqlite repos github.db simonw dogsheep

Fetching repos that have been starred by a user

The starred command fetches the repos that have been starred by a user.

$ github-to-sqlite starred github.db simonw

If you are using an auth.json file you can omit the username to retrieve the starred repos for the authenticated user.

Scraping dependents for a repository

The GitHub dependency graph can show other GitHub projects that depend on a specific repo, for example simonw/datasette/network/dependents.

This data is not yet available through the GitHub API. The scrape-dependents command scrapes those pages and uses the GitHub API to load full versions of the dependent repositories.

$ github-to-sqlite scrape-dependents github.db simonw/datasette

The command accepts one or more repositories.

Add -v for verbose output.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

github_to_sqlite-2.1-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file github_to_sqlite-2.1-py3-none-any.whl.

File metadata

  • Download URL: github_to_sqlite-2.1-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.10

File hashes

Hashes for github_to_sqlite-2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd91e0efee62b52fed299615de6548cbd665ac09f6dda7f00227309cf5550b37
MD5 d1eb4e9f36bb15ef3185aef9025cea26
BLAKE2b-256 8f9f582a9358c6c10d9081924e3caa639bb8b2ff73d79aacab0c7a5959e30815

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page