Skip to main content

A tool to query GraphQL for collecting repositories metadata.

Project description

radon-repositories-collector

A Python package to query GraphQL for collecting GitHub repositories metadata.

lgtm pypi-version License

Note, the tool requires a personal access token to access the GraphQL APIs. See how to get one here.

Install

The package can be installed from PyPI as follows:

pip install repositories-collector

Python usage

import os
from datetime import datetime
from repocollector.github import GithubRepositoriesCollector

github_crawler = GithubRepositoriesCollector('<GITHUB ACCESS TOKEN>') 

for repo in github_crawler.collect_repositories(
                since=datetime(2019, 12, 31),
                until=datetime(2020, 12, 31),
                pushed_after=datetime(2020, 6, 1),
                min_issues=0,
                min_releases=0,
                min_stars=0,
                min_watchers=0,
                primary_language='<language>'):

    print('id:', repo['id']) # e.g., 123456
    print('default_branch:', repo['default_branch']) # e.g., main
    print('owner:', repo['owner']) # e.g., radon-h2020
    print('name:', repo['name']) # e.g., radon-repositories-collector
    print('full name:', repo['full_name']) # e.g., radon-h2020/radon-repositories-collector
    print('url:', repo['url'])
    print('description:', repo['description'])
    print('issues:', repo['issues'])
    print('releases:', repo['releases'])
    print('stars:', repo['stars'])
    print('watchers:', repo['watchers'])
    print('primary_language:', repo['primary_language'])
    print('created_at:', repo['created_at'])
    print('pushed_at:', repo['pushed_at'])
    print('dirs:', repo['dirs']) # list of repo's root directories, e.g., [repocollector]

Command-line usage

A Python library to collect repositories metadata from GitHub.

positional arguments:
  since                 collect repositories created since this date (default: 2014-01-01 00:00:00)
  until                 collect repositories created up to this date (default: 2014-01-01 00:00:00)
  dest                  destination folder for report

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --pushed-after DATE_PUSH
                        collect only repositories pushed after this date (default: 2014-01-01 00:00:00)
  --min-issues MIN_ISSUES
                        collect repositories with at least <min-issues> issues (default: 0)
  --min-releases MIN_RELEASES
                        collect repositories with at least <min-releases> releases (default: 0)
  --min-stars MIN_STARS
                        collect repositories with at least <min-stars> stars (default: 0)
  --min-watchers MIN_WATCHERS
                        collect repositories with at least <min-watchers> watchers (default: 0)
  --primary-language PRIMARY_LANGUAGE
                        collect repositories written in this language
  --verbose             show log (default: False)

Output Running the tool from command-line generates a JSON and HTML report accessible at <dest>/report.html.

Example The following command searches for repositories written in python created between 31 Dec 2019 and 31 Dec 2020 with at least one commit after 1 Jun 2020 (i.e.,pushed after):

repositories-collector 2019-12-31 2020-12-31 /tmp/ --pushed_after 2020-06-01 --min_issues 0 --min_releases 0 --min_stars 0 --min_watchers 0 --primary_language python

The report is saved at /tmp/repositories.html and /tmp/repositories.json.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repositories_collector-0.0.5.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

repositories_collector-0.0.5-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file repositories_collector-0.0.5.tar.gz.

File metadata

  • Download URL: repositories_collector-0.0.5.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for repositories_collector-0.0.5.tar.gz
Algorithm Hash digest
SHA256 ac535b3168cf883e789e5c7b47c76aeb58a6b19dcad2c9acbfb470472415d73c
MD5 b681e5629bf9b0d9c52f1e82ffa36ee6
BLAKE2b-256 caefffc11a7b3907d77ba3eea7d06ad19df5a3c35b6a25f09828680068533cfe

See more details on using hashes here.

File details

Details for the file repositories_collector-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: repositories_collector-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for repositories_collector-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 bbb535609ade14f2c312f44748c1fe925524fa498eba49147fc2ee0b3f33fac6
MD5 bbd43d46236fd135eb22d328ab375082
BLAKE2b-256 fb55e4bfb04fed187d6f88eacc7a4e551a4943d04b44ddfc1860d59c9fb7465a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page