Singer.io tap for extracting data from BigQuery

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Operating System
- MacOS :: MacOS X
- POSIX :: Linux
Programming Language

Project description

tap-bigquery

Extract data from BigQuery tables.

This is a Singer tap that produces JSON-formatted data following the Singer spec.

This tap:

Pulls data from Google BigQuery tables/views with datetime field.
Infers the schema for each resource and produce catalog file.
Incrementally pulls data based on the input state.

Installation

Step 0: Acknowledge LICENSE and TERMS

Please especially note that the author(s) of tap-bigquery is not responsible for the cost, including but not limited to BigQuery cost) incurred by running this program.

Step 1: Activate the Google BigQuery API

(originally found in the Google API docs)

Use this wizard to create or select a project in the Google Developers Console and activate the BigQuery API. Click Continue, then Go to credentials.
On the Add credentials to your project page, click the Cancel button.
At the top of the page, select the OAuth consent screen tab. Select an Email address, enter a Product name if not already set, and click the Save button.
Select the Credentials tab, click the Create credentials button and select OAuth client ID.
Select the application type Other, enter the name "Singer BigQuery Tap", and click the Create button.
Click OK to dismiss the resulting dialog.
Click the Download button to the right of the client ID.
Move this file to your working directory and rename it client_secrets.json.

Export the location of the secret file:

export GOOGLE_APPLICATION_CREDENTIALS="./client_secret.json"

For other authentication method, please see Authentication section.

Step 2: Install

First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.

pip install -U tap-bigquery

Or you can install the lastest development version from GitHub:

pip install --no-cache-dir https://github.com/anelendata/tap-bigquery/archive/master.tar.gz#egg=tap-bigquery

Run

Step 1: Configure

Create a file called tap_config.json in your working directory, following config.sample.json:

{
  "streams": [
      {"name": "<some_schema_name>",
       "table": "`<project>.<dataset>.<table>`",
       "columns": ["<col_name_0>", "<col_name_1>", "<col_name_2>"],
       "datetime_key": "<your_key>",
       "filters": ["country='us'", "state='CA'",
                   "registered_on>=DATE_ADD(current_date, INTERVAL -7 day)"
                  ] // also optional: these are parsed in 'WHERE' clause
      }
    ],
  "start_datetime": "2017-01-01T00:00:00Z", // This can be set at the command line argument
  "end_datetime": "2017-02-01T00:00:00Z", // end_datetime is optional
  "limit": 100,
  "start_always_inclusive": false // default is false, optional
}

The required parameters is at least one stream (one bigquery table/view) to copy.
- It is not a recommended BigQuery practice to use * to specify the columns as it may blow up the cost for a table with a large number of columns.
- filters are optional but we strongly recommend using this over a large partitioned table to control the cost. LIMIT (The authors of tap-bigquery is not responsible for the cost incurred by running this program. Always test thoroughly with small data set first.)
start_datetime must also be set in the config file or as the command line argument (See the next step).
limit will limit the number of results, but it does not result in reduce the query cost.

The table/view is expected to have a column to indicate the creation or update date and time so the tap sends the query with ORDER BY and use the column to record the bookmark (See State section).

Step 2: Create catalog

Run tap-bigquery in discovery mode to let it create json schema file and then run them together, piping the output of tap-bigquery to target-csv:

tap-bigquery -c tap_config.json -d > catalog.json

Step 3: Run

tap-bigquery can be run with any Singer Target. As example, let use target-csv.

pip install target-csv

Run:

tap-bigquery -c tap_config.json \
    --catalog catalog.json --start_datetime '2020-08-01T00:00:00Z' \
    --end_datetime '2020-08-02T01:00:00Z' | target-csv --config target_config.json \
    > state.json

This example should create a csv file in the same directory. state.json should contain a state (bookmark) after the run. (See State section).

Notes:

start and end datetimes accept ISO 8601 format, can be date only. start datetime is inclusive, end datetime is not.
It is recommended to inspect the catalog file and fix the auto-type assignment if necessary.
target-csv's target_config.json is optinal.
tap-bigquery can produce nested records but it's up to target if the data writing will be successful. In this example with target-csv, the table is expected to be flat.

Authentication

It is recommended to use tap-bigquery with a service account.

Download the client_secrets.json file for your service account, and place it on the machine where tap-bigquery will be executed.
Set a GOOGLE_APPLICATION_CREDENTIALS environment variable on the machine, where the value is the fully qualified path to client_secrets.json

In environments where it is preferable to supply secrets or other configuration via environment variables or where it is not feasible to make a client_secrets.json file available, you may instead provide credential as a JSON string in the GOOGLE_APPLICATION_CREDENTIALS_STRING environment variable. This JSON string should contain the same contents and be formatted the same way as the contents of a client_secrets.json file.

In the testing environment, you can also manually authenticate before runnig the tap. In this case you do not need GOOGLE_APPLICATION_CREDENTIALS defined:

gcloud auth application-default login

You may also have to set the project:

gcloud config set project <project-id>

Though not tested, it should also be possible to use the OAuth flow to authenticate to GCP as well:

tap-bigquery will attempt to open a new window or tab in your default browser. If this fails, copy the URL from the console and manually open it in your browser.
If you are not already logged into your Google account, you will be prompted to log in.
If you are logged into multiple Google accounts, you will be asked to select one account to use for the authorization.
Click the Accept button to allow tap-bigquery to access your Google BigQuery table.
You can close the tab after the signup flow is complete.

State

This tap emits state. The command also takes a state file input with --state <file-name> option. If the state is set, start_datetime config and command line argument are ignored and the datetime value from last_update key is used as the resuming point.

To avoid the data duplication, start datetime is exclusive start_datetime < datetime_column when the tap runs with state option. If you fear a data loss because of this, just use the --start_datetime option instead of state. Or set start_always_inclusive: true in configuration.

The tap itself does not output a state file. It anticipate the target program or a downstream process to fianlize the state safetly and produce a state file.

Original repo

https://github.com/anelendata/tap-bigquery

About this project

This project is developed by ANELEN and friends. Please check out the ANELEN's open innovation philosophy and other projects

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Operating System
- MacOS :: MacOS X
- POSIX :: Linux
Programming Language

Release history Release notifications | RSS feed

This version

0.3.10

Dec 26, 2025

0.3.9

Jul 18, 2024

0.3.8

Jul 2, 2024

0.3.7

Apr 16, 2023

0.3.6

Jul 15, 2022

0.3.5

Dec 23, 2020

0.3.4

Dec 3, 2020

0.3.3

Oct 26, 2020

0.3.2

Sep 13, 2020

0.3.1

Sep 2, 2020

0.3.0

Aug 21, 2020

0.3.0b0 pre-release

Aug 21, 2020

0.2.0

May 26, 2020

0.1.5

May 21, 2020

0.1.5rc1 pre-release

May 14, 2020

0.1.4

May 13, 2020

0.1.3

May 13, 2020

0.1.3rc1 pre-release

May 13, 2020

0.1.2

May 13, 2020

0.1.1

May 13, 2020

0.1.0

May 12, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tap_bigquery-0.3.10.tar.gz (17.7 kB view details)

Uploaded Dec 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tap_bigquery-0.3.10-py3-none-any.whl (15.0 kB view details)

Uploaded Dec 26, 2025 Python 3

File details

Details for the file tap_bigquery-0.3.10.tar.gz.

File metadata

Download URL: tap_bigquery-0.3.10.tar.gz
Upload date: Dec 26, 2025
Size: 17.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.12

File hashes

Hashes for tap_bigquery-0.3.10.tar.gz
Algorithm	Hash digest
SHA256	`a734049cccf20d30049c550fb9ba263dcc714b498328b769abc9da4c78067ad2`
MD5	`8a342620b0e872e93befbcc3e2cc7adb`
BLAKE2b-256	`1b660dd65b43b8db1bb277e83f15b2788a66582e4bbcd69532316b383aac7379`

See more details on using hashes here.

File details

Details for the file tap_bigquery-0.3.10-py3-none-any.whl.

File metadata

Download URL: tap_bigquery-0.3.10-py3-none-any.whl
Upload date: Dec 26, 2025
Size: 15.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.12

File hashes

Hashes for tap_bigquery-0.3.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e80ba7e9c09f4a9cb03dbd757cc30d8fb98b11bad1bb5ea88d8829fb48e0e198`
MD5	`c9fe444e0419db5a90506d04abd653ee`
BLAKE2b-256	`e9660a17a59050c6bf5b1a62eb511c729bb2b60d5569f40b5a77d92981e868e8`

See more details on using hashes here.

tap-bigquery 0.3.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tap-bigquery

Installation

Step 0: Acknowledge LICENSE and TERMS

Step 1: Activate the Google BigQuery API

Step 2: Install

Run

Step 1: Configure

Step 2: Create catalog

Step 3: Run

Authentication

State

Original repo

About this project

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes