A tool to query GraphQL for collecting repositories metadata.
Project description
radon-repositories-collector
A Python package to query GraphQL for collecting GitHub repositories metadata.
Note, the tool requires a personal access token to access the GraphQL APIs. See how to get one here.
Install
The package can be installed from PyPI as follows:
pip install repositories-collector
Python usage
import os
from datetime import datetime
from repocollector.github import GithubRepositoriesCollector
github_crawler = GithubRepositoriesCollector('<GITHUB ACCESS TOKEN>')
for repo in github_crawler.collect_repositories(
since=datetime(2019, 12, 31),
until=datetime(2020, 12, 31),
pushed_after=datetime(2020, 6, 1),
min_issues=0,
min_releases=0,
min_stars=0,
min_watchers=0,
primary_language='<language>'):
print('id:', repo['id']) # e.g., 123456
print('default_branch:', repo['default_branch']) # e.g., main
print('owner:', repo['owner']) # e.g., radon-h2020
print('name:', repo['name']) # e.g., radon-repositories-collector
print('full name:', repo['full_name']) # e.g., radon-h2020/radon-repositories-collector
print('url:', repo['url'])
print('description:', repo['description'])
print('issues:', repo['issues'])
print('releases:', repo['releases'])
print('stars:', repo['stars'])
print('watchers:', repo['watchers'])
print('primary_language:', repo['primary_language'])
print('created_at:', repo['created_at'])
print('pushed_at:', repo['pushed_at'])
print('dirs:', repo['dirs']) # list of repo's root directories, e.g., [repocollector]
Command-line usage
A Python library to collect repositories metadata from GitHub.
positional arguments:
since collect repositories created since this date (default: 2014-01-01 00:00:00)
until collect repositories created up to this date (default: 2014-01-01 00:00:00)
dest destination folder for report
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--pushed-after DATE_PUSH
collect only repositories pushed after this date (default: 2014-01-01 00:00:00)
--min-issues MIN_ISSUES
collect repositories with at least <min-issues> issues (default: 0)
--min-releases MIN_RELEASES
collect repositories with at least <min-releases> releases (default: 0)
--min-stars MIN_STARS
collect repositories with at least <min-stars> stars (default: 0)
--min-watchers MIN_WATCHERS
collect repositories with at least <min-watchers> watchers (default: 0)
--primary-language PRIMARY_LANGUAGE
collect repositories written in this language
--verbose show log (default: False)
Output Running the tool from command-line generates a JSON and HTML report accessible at <dest>/report.html.
Example The following command searches for repositories written in python created between 31 Dec 2019 and 31 Dec 2020 with at least one commit after 1 Jun 2020 (i.e.,pushed after):
repositories-collector 2019-12-31 2020-12-31 /tmp/ --pushed_after 2020-06-01 --min_issues 0 --min_releases 0 --min_stars 0 --min_watchers 0 --primary_language python
The report is saved at /tmp/repositories.html
and /tmp/repositories.json
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file repositories_collector-0.0.5.tar.gz
.
File metadata
- Download URL: repositories_collector-0.0.5.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac535b3168cf883e789e5c7b47c76aeb58a6b19dcad2c9acbfb470472415d73c |
|
MD5 | b681e5629bf9b0d9c52f1e82ffa36ee6 |
|
BLAKE2b-256 | caefffc11a7b3907d77ba3eea7d06ad19df5a3c35b6a25f09828680068533cfe |
File details
Details for the file repositories_collector-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: repositories_collector-0.0.5-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bbb535609ade14f2c312f44748c1fe925524fa498eba49147fc2ee0b3f33fac6 |
|
MD5 | bbd43d46236fd135eb22d328ab375082 |
|
BLAKE2b-256 | fb55e4bfb04fed187d6f88eacc7a4e551a4943d04b44ddfc1860d59c9fb7465a |