Skip to main content

Singer.io tap for extracting data from the GitHub API

Project description

pipelinewise-tap-github

PyPI version PyPI - Python Version License: MIT

Singer tap that produces JSON-formatted data from the GitHub API following the Singer spec.

This is a PipelineWise compatible tap connector.

This tap:

Quick start

  1. Install

    We recommend using a virtualenv:

    python3 -m venv venv
    . venv/bin/activate
    pip install --upgrade pip
    pip install .
    
  2. Create a GitHub access token

    Login to your GitHub account, go to the Personal Access Tokens settings page, and generate a new token with at least the repo scope. Save this access token, you'll need it for the next step.

  3. Create the config file

    Create a JSON file containing the required fields and/or the optional ones. You can decide between allow-list or deny-list strategy combining organization with repos_include and repos_exclude using wildcards.

Config Required? Description
access_token yes The access token to access github api
start_date yes The date inclusive to start extracting the data
organization no The organization you want to extract the data from
repos_include no Allow list strategy to extract selected repos data from organization. Supports wildcard matching
repos_exclude no Deny list to extract all repos from organization except the ones listed. Supports wildcard matching
include_archived no true/false to include archived repos. Default false
include_disabled no true/false to include disabled repos. Default false
repository no (DEPRECATED) Allow list strategy to extract selected repos data from organization(has priority over repos_exclude)
max_rate_limit_wait_seconds no Max time to wait if you hit the github api limit. DEFAULT to 600s

Example:

{
  "access_token": "ghp_16C7e42F292c6912E7710c838347Ae178B4a",
  "organization": "singer-io", 
  "repos_exclude": "*tests* api-docs",
  "repos_include": "tap* getting-started pipelinewise-github",
  "start_date": "2021-01-01T00:00:00Z",
  "include_archived": false,
  "include_disabled": false,
  "max_rate_limit_wait_seconds": 800
}

You can also pass singer-io/tap-github another-org/tap-octopus on repos_include.

For retro compatibility you can pass repository: "singer-io/tap-github singer-io/getting-started"

:warning: If you have very small repos with total size less than 0.5KB: These will currently be excluded, as the Github repositories API returns size: 0 for these, and tap_github/__init__.py currently uses size <= 0 as a way to filter out repos with no commits.

  1. Run the tap in discovery mode to get properties.json file

    tap-github --config config.json --discover > properties.json
    
  2. In the properties.json file, select the streams to sync

    Each stream in the properties.json file has a "schema" entry. To select a stream to sync, add "selected": true to that stream's "schema" entry. For example, to sync the pull_requests stream:

    ...
    "tap_stream_id": "pull_requests",
    "schema": {
      "selected": true,
      "properties": {
        "updated_at": {
          "format": "date-time",
          "type": [
            "null",
            "string"
          ]
        }
    ...
    
  3. Run the application

    tap-github can be run with:

    tap-github --config config.json --properties properties.json
    

To run tests

  1. Install python test dependencies in a virtual env and run nose unit and integration tests
  python3 -m venv venv
  . venv/bin/activate
  pip install --upgrade pip
  pip install -e .[test]
  1. To run unit tests:
  pytest tests/unittests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelinewise-tap-github-1.0.2.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipelinewise_tap_github-1.0.2-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file pipelinewise-tap-github-1.0.2.tar.gz.

File metadata

  • Download URL: pipelinewise-tap-github-1.0.2.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6

File hashes

Hashes for pipelinewise-tap-github-1.0.2.tar.gz
Algorithm Hash digest
SHA256 25b6a7b09da836ac3a181a61704e66ac156bdbc54a0945a604c4285fadb8d244
MD5 41882dfc5eecd68eda2cee47d1acd59a
BLAKE2b-256 6cdb70a76953f7ba6b907ba426df9ebc3d0104312bec7911e20e8696de779c31

See more details on using hashes here.

File details

Details for the file pipelinewise_tap_github-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: pipelinewise_tap_github-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 34.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.4 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.9.6

File hashes

Hashes for pipelinewise_tap_github-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3936a72df8d2d519391571c515d710f6b1d12cebaa441d1cfae330e8d6fa290b
MD5 d7bce4a3fc28179beb7aa596a8f6ed83
BLAKE2b-256 f73d61ee853ec2afefce4c93f04dc3f27b71bdd4e9552b65a76b2464660875c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page