Save data from GitHub to a SQLite database
Project description
github-to-sqlite
Save data from GitHub to a SQLite database.
- Demo
- How to install
- Authentication
- Fetching issues for a repository
- Fetching issue comments for a repository
- Fetching commits for a repository
- Fetching tags for a repository
- Fetching contributors to a repository
- Fetching repos belonging to a user or organization
- Fetching specific repositories
- Fetching repos that have been starred by a user
- Fetching users that have starred specific repos
- Scraping dependents for a repository
- Fetching emojis
- Making authenticated API calls
Demo
https://github-to-sqlite.dogsheep.net/ hosts a Datasette demo of a database created by running this tool against all of the repositories in the Dogsheep GitHub organization, plus the datasette and sqlite-utils repositories.
How to install
$ pip install github-to-sqlite
Authentication
Create a GitHub personal access token: https://github.com/settings/tokens
Run this command and paste in your new token:
$ github-to-sqlite auth
This will create a file called auth.json
in your current directory containing the required value. To save the file at a different path or filename, use the --auth=myauth.json
option.
As an alternative to using an auth.json
file you can add your access token to an environment variable called GITHUB_TOKEN
.
Fetching issues for a repository
The issues
command retrieves all of the issues belonging to a specified repository.
$ github-to-sqlite issues github.db simonw/datasette
If an auth.json
file is present it will use the token from that file. It works without authentication for public repositories but you should be aware that GitHub have strict IP-based rate limits for unauthenticated requests.
You can point to a different location of auth.json
using -a
:
$ github-to-sqlite issues github.db simonw/datasette -a /path/to/auth.json
You can use the --issue
option to only load just one specific issue:
$ github-to-sqlite issues github.db simonw/datasette --issue=1
Fetching issue comments for a repository
The issue-comments
command retrieves all of the comments on all of the issues in a repository.
It is recommended you run issues
first, so that each imported comment can have a foreign key poining to its issue.
$ github-to-sqlite issues github.db simonw/datasette
$ github-to-sqlite issue-comments github.db simonw/datasette
You can use the --issue
option to only load comments for a specific issue within that repository, for example:
$ github-to-sqlite issue-comments github.db simonw/datasette --issue=1
Fetching commits for a repository
The commits
command retrieves details of all of the commits for one or more repositories. It currently fetches the sha, commit message and author and committer details - it does no retrieve the full commit body.
$ github-to-sqlite commits github.db simonw/datasette simonw/sqlite-utils
The command accepts one or more repositories.
By default it will stop as soon as it sees a commit that has previously been retrieved. You can force it to retrieve all commits (including those that have been previously inserted) using --all
.
Fetching tags for a repository
The tags
command retrieves all of the tags for one or more repositories.
$ github-to-sqlite tags github.db simonw/datasette simonw/sqlite-utils
Fetching contributors to a repository
The contributors
command retrieves details of all of the contributors for one or more repositories.
$ github-to-sqlite contributors github.db simonw/datasette simonw/sqlite-utils
The command accepts one or more repositories. It populates a contributors
table, with foreign keys to repos
and users
and a contributions
table listing the number of commits to that repository for each contributor.
Fetching repos belonging to a user or organization
The repos
command fetches repos belonging to a user or organization.
Without any other arguments, this command will fetch all repos that the currently authenticated user owns, collaborates on or can access via one of their organizations:
$ github-to-sqlite repos github.db
To fetch repos belonging to a specific user or organization, provide their username as an argument:
$ github-to-sqlite repos github.db dogsheep # organization
$ github-to-sqlite repos github.db simonw # user
You can pass more than one username to fetch for multiple users or organizations at once:
$ github-to-sqlite repos github.db simonw dogsheep
Fetching specific repositories
You can use -r
with the repos
command one or more times to fetch just specific repositories.
$ github-to-sqlite repos github.db -r simonw/datasette -r dogsheep/github-to-sqlite
Fetching repos that have been starred by a user
The starred
command fetches the repos that have been starred by a user.
$ github-to-sqlite starred github.db simonw
If you are using an auth.json
file you can omit the username to retrieve the starred repos for the authenticated user.
Fetching users that have starred specific repos
The stargazers
command fetches the users that have starred the specified repos.
$ github-to-sqlite stargazers github.db simonw/datasette dogsheep/github-to-sqlite
You can specify one or more repository using owner/repo
syntax.
Users fetched using this command will be inserted into the users
table. Many-to-many records showing which repository they starred will be added to the stars
table.
Scraping dependents for a repository
The GitHub dependency graph can show other GitHub projects that depend on a specific repo, for example simonw/datasette/network/dependents.
This data is not yet available through the GitHub API. The scrape-dependents
command scrapes those pages and uses the GitHub API to load full versions of the dependent repositories.
$ github-to-sqlite scrape-dependents github.db simonw/datasette
The command accepts one or more repositories.
Add -v
for verbose output.
Fetching emojis
You can fetch a list of every emoji supported by GitHub using the emojis
command:
$ github-to-sqlite emojis github.db
This will create a table callad emojis
with a primary key name
and a url
column.
If you add the --fetch
option the command will also fetch the binary content of the images and place them in an image
column:
$ github-to-sqlite emojis emojis.db -f
[########----------------------------] 397/1799 22% 00:03:43
You can then use the datasette-render-images plugin to browse them visually.
Making authenticated API calls
The github-to-sqlite get
command provides a convenient shortcut for making authenticated calls to the API. Once you have created your auth.json
file (or set a GITHUB_TOKEN
environment variable) you can use it like this:
$ github-to-sqlite get https://api.github.com/gists
This will make an authenticated call to the URL you provide and pretty-print the resulting JSON to the console.
You can ommit the https://api.github.com/
prefix, for example:
$ github-to-sqlite get /gists
Many GitHub APIs are paginated using the HTTP Link header. You can follow this pagination and output a list of all of the resulting items using --paginate
:
$ github-to-sqlite get /users/simonw/repos --paginate
You can outline newline-delimited JSON for each item using --nl
. This can be useful for streaming items into another tool.
$ github-to-sqlite get /users/simonw/repos --nl
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file github-to-sqlite-2.6.tar.gz
.
File metadata
- Download URL: github-to-sqlite-2.6.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 066e8d2b202696054047f26bb836c2732a5b62de6775da9efbbec24fc899e7be |
|
MD5 | 80ebfc376867e0c27815fbeaf5eb5039 |
|
BLAKE2b-256 | 6ae4453c6d0583aea208edb41e301bbd5f58bbf483283f891b42e465264d8858 |
File details
Details for the file github_to_sqlite-2.6-py3-none-any.whl
.
File metadata
- Download URL: github_to_sqlite-2.6-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b97e8dfaef0a69b6f1704dac63ef8c4b5c4abb1a7a9861d6706b40c0fcf06c40 |
|
MD5 | 15929785c315b1d4e56bcb8fbd1a1e46 |
|
BLAKE2b-256 | 4e91613b370eafd80cd0a510ceea0ecf3f0ff532f8ef42ca19894d33734df91e |