Skip to main content

A tool to query GraphQL for collecting repositories metadata.

Project description

radon-repositories-collector

A Python package to query GraphQL for collecting GitHub repositories metadata.

lgtm pypi-version License

Install

The package can be installed from PyPI as follows:

pip install repositories-collector

Python usage

import os
from datetime import datetime
from repocollector.github import GithubRepositoriesCollector

github_crawler = GithubRepositoriesCollector(
                access_token=os.getenv('GITHUB_ACCESS_TOKEN'),  # or paste your token
                since=datetime(2019, 12, 31),
                until=datetime(2020, 12, 31),
                pushed_after=datetime(2020, 6, 1),
                min_issues=0,
                min_releases=0,
                min_stars=0,
                min_watchers=0,
                primary_language='language') # e.g., python

for repo in github_crawler.collect_repositories():
    print('id:', repo['id']) # e.g., 123456
    print('default_branch:', repo['default_branch']) # e.g., main
    print('owner:', repo['owner']) # e.g., radon-h2020
    print('name:', repo['name']) # e.g., radon-repositories-collector
    print('url:', repo['url'])
    print('description:', repo['description'])
    print('issues:', repo['issues'])
    print('releases:', repo['releases'])
    print('stars:', repo['stars'])
    print('watchers:', repo['watchers'])
    print('primary_language:', repo['primary_language'])
    print('created_at:', repo['created_at'])
    print('pushed_at:', repo['pushed_at'])
    print('dirs:', repo['dirs']) # list of repo's root directories, e.g., [repocollector]

Command-line usage

usage: repositories-collector [-h] [-v] [--from DATE_FROM]
                                    [--to DATE_TO] [--pushed-after DATE_PUSH]
                                    [--min-issues MIN_ISSUES]
                                    [--min-releases MIN_RELEASES]
                                    [--min-stars MIN_STARS]
                                    [--min-watchers MIN_WATCHERS] [--verbose]
                                    dest

A Python library to collect repositories metadata from GitHub.

positional arguments:
  dest                  destination folder for report

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --from DATE_FROM      collect repositories created since this date (default:
                        2014-01-01 00:00:00)
  --to DATE_TO          collect repositories created up to this date (default:
                        2014-01-01 00:00:00)
  --pushed-after DATE_PUSH
                        collect only repositories pushed after this date
                        (default: 2019-01-01 00:00:00)
  --min-issues MIN_ISSUES
                        collect repositories with at least <min-issues> issues
                        (default: 0)
  --min-releases MIN_RELEASES
                        collect repositories with at least <min-releases>
                        releases (default: 0)
  --min-stars MIN_STARS
                        collect repositories with at least <min-stars> stars
                        (default: 0)
  --min-watchers MIN_WATCHERS
                        collect repositories with at least <min-watchers>
                        watchers (default: 0)
  --primary-language LANGUAGE
                        collect repositories written in this language
  --verbose             show log (default: False)

Important! The tool requires a personal access token to access the GraphQL APIs. See how to get one here. Add GITHUB_ACCESS_TOKEN=<paste here your token> to the environment variables.

Output Running the tool from command-line generates an HTML report accessible at <dest>/report.html.

Example The following command search for repositories written in python created between 2014-02-01 and 2014-02-03. The report is saved in the folder /tmp/

repositories-collector 2014-02-01 2014-02-03 /tmp/ --primary-language python

Contributions

To report bugs, visit the issue tracker.

In case you want to play with the source code or contribute improvements, see CONTRIBUTING.

Version

[0.0.2] Fixed missed import of config.json in MANIFEST.in

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repositories_collector-0.0.4.tar.gz (13.6 kB view hashes)

Uploaded Source

Built Distribution

repositories_collector-0.0.4-py3-none-any.whl (13.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page