Skip to main content

Python interfaces to Github, Bitbucket and Gitlab APIs

Project description

Python interface for code hosting platforms API

It is intended to facilitate research of Open Source projects. At this point, it is basically functional but is missing:

  • tests
  • documentation
  • good architecture

Feel free to contribute any of those.


pip install --user --upgrade strudel.scraper


import stscraper as scraper
import pandas as pd

gh_api = scraper.GitHubAPI()
# so far only GiHub, Bitbucket and Gitlab are supported
# bb_api = scraper.BitbucketAPI()
# gl_api = scraper.GitLabAPI()

# repo_issues is a generator that can be used
# to instantiate a pandas dataframe
issues = pd.DataFrame(gh_api.repo_issues('cmustrudel/strudel.scraper'))


GitHub and GitLab APIs limit request rate for unauthenticated requests (although GitLab limit is much more generous). There are several ways to set your API keys, listed below in order of priority.

Important note: API objects are reused in subsequent calls. The same keys used to instantiate the first API object will be used by ALL other instances.

Class instantiation:

import stscraper

gh_api = stscraper.GitHubAPI(tokens="comman-separated list of tokens")

At runtime:

import stscraper
import stutils

# IMPORTANT: do this before creation of the first API object!
stutils.CONFIG['GITHUB_API_TOKENS'] = 'comma-separated list of tokens'
stutils.CONFIG['GITLAB_API_TOKENS'] = 'comma-separated list of tokens'

# any api instance created after this, will use the provided tokens
gh_api = stscraper.GitHubAPI()

settings file:

project root
  |- my_module
  |   \-

GITHUB_API_TOKENS = 'comma-separated list of tokens'
GITLAB_API_TOKENS = 'comma-separated list of tokens'
import stscraper

# keys from will be reused automatically
gh_api = stscraper.GitHubAPI()

Environment variable:

# somewhere in ~/.bashrc
export GITHUB_API_TOKENS='comma-separated list of tokens'
export GITLAB_API_TOKENS='comma-separated list of tokens'
# somewhere in the code
import stscraper

# keys from environment variables will be reused automatically
gh_api = stscraper.GitHubAPI()

Hub config:

If you have hub installed and everything else fails, its configuration will be reused for GitHub API.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for strudel.scraper, version 0.4.0
Filename, size File type Python version Upload date Hashes
Filename, size strudel.scraper-0.4.0-py2.py3-none-any.whl (35.1 kB) File type Wheel Python version py2.py3 Upload date Hashes View hashes
Filename, size strudel.scraper-0.4.0.tar.gz (30.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page