Skip to main content

Singer tap for GitHub, built with the Singer SDK.

Project description

tap-github

tap-github is a Singer tap for GitHub.

Built with the Singer SDK.

Installation

# use uv (https://docs.astral.sh/uv/)
uv tool install meltanolabs-tap-github

# or pipx (https://pipx.pypa.io/stable/)
pipx install meltanolabs-tap-github

# or Meltano
meltano add extractor tap-github

A list of release versions is available at https://github.com/MeltanoLabs/tap-github/releases

Configuration

Accepted Config Options

This tap accepts the following configuration options:

  • Required: One and only one of the following modes:
    1. repositories: An array of strings specifying the GitHub repositories to be included. Each element of the array should be of the form <org>/<repository>, e.g. MeltanoLabs/tap-github.
    2. organizations: An array of strings containing the github organizations to be included
    3. searches: An array of search descriptor objects with the following properties:
      • name: A human readable name for the search query
      • query: A github search string (generally the same as would come after ?q= in the URL)
    4. user_usernames: A list of github usernames
    5. user_ids: A list of github user ids [int]
  • Highly recommended:
    • Personal access tokens (PATs) for authentication can be provided in 3 ways:
      • auth_token - Takes a single token.
      • additional_auth_tokens - Takes a list of tokens. Can be used together with auth_token or as the sole source of PATs.
      • Any environment variables beginning with GITHUB_TOKEN will be assumed to be PATs. These tokens will be used in addition to auth_token (if provided), but will not be used if additional_auth_tokens is provided.
    • GitHub App keys are another option for authentication, and can be used in combination with PATs if desired. App IDs and keys should be assembled into the format :app_id:;;-----BEGIN RSA PRIVATE KEY-----\n_YOUR_P_KEY_\n-----END RSA PRIVATE KEY----- (replace :app_id: with your actual GitHub App ID and _YOUR_P_KEY_ with your private key content) where the key can be generated from the Private keys section on https://github.com/organizations/:organization_name/settings/apps/:app_name. Read more about GitHub App quotas here. Formatted app keys can be provided in 3 ways:
      • auth_app_keys - List of GitHub App keys in the prescribed format. These keys are organization-agnostic and will be used as fallback for all organizations.
      • org_auth_app_keys - Object/dictionary mapping organization names to lists of GitHub App keys. This allows you to specify different app credentials for different organizations, enabling better rate limit management across multiple organizations. Example:
        auth_app_keys:
          - "fallback_app_id;;-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----"
        org_auth_app_keys:
          my-org:
            - "app_id_1;;-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----"
            - "app_id_2;;-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----"
          another-org:
            - "app_id_3;;-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----"
        
      • If auth_app_keys is not provided but there is an environment variable with the name GITHUB_APP_PRIVATE_KEY, it will be assumed to be an App key in the prescribed format (organization-agnostic).
  • Optional:
    • user_agent
    • start_date
    • metrics_log_level
    • stream_maps
    • stream_maps_config
    • stream_options: Options which can change the behaviour of a specific stream are nested within.
      • milestones: Valid options for the milestones stream are nested within.
        • state: Determines which milestones will be extracted. One of open (default), closed, all.
    • rate_limit_buffer: A buffer to avoid consuming all query points for the auth_token at hand. Defaults to 1000.
    • expiry_time_buffer: A buffer used when determining when to refresh GitHub app tokens. Only relevant when authenticating as a GitHub app. Defaults to 10 minutes. Tokens generated by GitHub apps expire 1 hour after creation, and will be refreshed once fewer than expiry_time_buffer minutes remain until the anticipated expiry time.
    • backoff_max_tries: The maximum number of backoff retry attempts for failed API requests that are retriable. Defaults to 5.

Note that modes 1-3 are repository modes and 4-5 are user modes and will not run the same set of streams.

A full list of supported settings and capabilities for this tap is available by running:

tap-github --about

Source Authentication and Authorization

A small number of records may be pulled without an auth token. However, a Github auth token should generally be considered "required" since it gives more realistic rate limits. (See GitHub API docs for more info.)

Multi-Organization Authentication

When using org_auth_app_keys, the tap will automatically switch authentication contexts based on the organization being processed. This enables:

  • Organization-specific rate limits: Each organization can have its own set of GitHub App credentials, preventing rate limit exhaustion when processing multiple organizations.
  • Automatic token selection: When processing repositories from a specific organization, the tap will prefer tokens configured for that organization.
  • Fallback behavior: If no organization-specific tokens are available, the tap will fall back to:
    1. Organization-agnostic tokens (personal tokens or auth_app_keys)
    2. Tokens from other organizations (for accessing public data)

Usage

API Limitation - Pagination

The GitHub API is limited for some resources such as /events. For some resources, users might encounter the following error:

In order to keep the API fast for everyone, pagination is limited for this resource. Check the rel=last link relation in the Link response header to see how far back you can traverse.

To avoid this, the GitHub streams will exit early. I.e. when there are no more next page available. If you are fecthing /events at the repository level, beware of letting the tap disabled for longer than a few days or you will have gaps in your data.

You can easily run tap-github by itself or in a pipeline using Meltano.

Notes regarding permissions

Executing the Tap Directly

tap-github --version
tap-github --help
tap-github --config CONFIG --discover > ./catalog.json

Contributing

This project uses parent-child streams. Learn more about them here.

Initialize your Development Environment

curl -LsSf https://astral.sh/uv/install.sh | sh  # https://docs.astral.sh/uv/getting-started/installation/
uv sync

Create and Run Tests

Create tests within the tap_github/tests subfolder and then run:

uv run pytest

You can also test the tap-github CLI interface directly using uv run:

uv run tap-github --help

Testing with Meltano

Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.

Your project comes with a custom meltano.yml project file already created. Open the meltano.yml and follow any "TODO" items listed in the file.

Next, install Meltano (if you haven't already) and any needed plugins:

# Install meltano
uv tool install meltano
# Initialize meltano within this directory
cd tap-github
meltano install

Now you can test and orchestrate using Meltano:

# Test invocation:
meltano invoke tap-github --version
# OR run a pipeline:
meltano run tap-github target-jsonl

One-liner to recreate output directory, run elt, and write out state file:

# Update this when you want a fresh state file:
TESTJOB=testjob1

# Run everything in one line
mkdir -p .output && meltano elt tap-github target-jsonl --job_id $TESTJOB && meltano elt tap-github target-jsonl --job_id $TESTJOB --dump=state > .output/state.json

Singer SDK Dev Guide

See the dev guide for more instructions on how to use the Singer SDK to develop your own taps and targets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meltanolabs_tap_github-1.26.4.tar.gz (54.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

meltanolabs_tap_github-1.26.4-py3-none-any.whl (59.0 kB view details)

Uploaded Python 3

File details

Details for the file meltanolabs_tap_github-1.26.4.tar.gz.

File metadata

  • Download URL: meltanolabs_tap_github-1.26.4.tar.gz
  • Upload date:
  • Size: 54.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for meltanolabs_tap_github-1.26.4.tar.gz
Algorithm Hash digest
SHA256 d82420fd829030652d7dd41e7a38961ca06705abd98e28beeec4d8cfdbed9818
MD5 6f10b9785f87b38bff92ef128c349853
BLAKE2b-256 6747f1c7d15be51a81ce3d6bcb532943f4538de813b38737bf90c07f4cf771d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for meltanolabs_tap_github-1.26.4.tar.gz:

Publisher: release.yml on MeltanoLabs/tap-github

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file meltanolabs_tap_github-1.26.4-py3-none-any.whl.

File metadata

File hashes

Hashes for meltanolabs_tap_github-1.26.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fef0d53004acd75527cb3de8febc99875891404060534e10b984098b65231f72
MD5 47c7bb17f58458d67c30d96ac6867a0e
BLAKE2b-256 54d7397e54015529ca5eacaed8c599c779b35a5143241a647cafec87723e7748

See more details on using hashes here.

Provenance

The following attestation bundles were made for meltanolabs_tap_github-1.26.4-py3-none-any.whl:

Publisher: release.yml on MeltanoLabs/tap-github

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page