JSON-file-streaming-GCS-BigQuery·PyPI

No project description provided

Project description

Python Library, which facilitates the processing of JSON files stored in Google Cloud Storage, transforming and loading them into Google BigQuery. This README includes an overview, installation instructions, dependencies, example usage, and additional details to help users get started.

Features:

Batch process JSON files from GCS.
Optionally add record entry timestamps and original file names to the dataset.
Move processed files to a new folder within the same GCS bucket.
Load transformed data into Google BigQuery in manageable chunks.

Installation

Install the package via pip:

pip install JSON_file_streaming_GCS_BigQuery

Dependencies

google-cloud-storage: To interact with Google Cloud Storage.
google-cloud-bigquery: For operations related to Google BigQuery.
pandas: For data manipulation and transformation.
json: To parse JSON files.
os: For operating system dependent functionality.

Ensure these dependencies are installed using:

pip install google-cloud-storage google-cloud-bigquery pandas

Usage

Example: Processing JSON Files from GCS and Loading into BigQuery

from your_library import process_json_file_streaming

process_json_file_streaming(
    dataset_id='your_dataset_id',
    table_name='your_table_name',
    project_id='your_project_id',
    bucket_name='your_bucket_name',
    source_folder_name='source_folder',
    destination_folder_name='destination_folder',
    chunk_size=10000,
    add_record_entry_time=True,
    add_file_name=True
)

Parameters:

dataset_id (str): The BigQuery dataset ID.
table_name (str): The BigQuery table name where data will be loaded.
project_id (str): The Google Cloud project ID.
bucket_name (str): The GCS bucket containing the source JSON files.
source_folder_name (str): Folder in GCS bucket where source JSON files are stored.
destination_folder_name (str): Folder to which processed JSON files are moved.
chunk_size (int, optional): Number of records per batch to be loaded into BigQuery.
add_record_entry_time (bool, optional): If True, adds a timestamp column to the dataset.
add_file_name (bool, optional): If True, adds the original file name as a column in the dataset.

Configuration Ensure you have configured credentials for Google Cloud:

For interacting with Google Cloud services, ensure your environment is set up with the appropriate credentials (using Google Cloud SDK or setting the GOOGLE_APPLICATION_CREDENTIALS environment variable to your service account key file).

Project details

Release history Release notifications | RSS feed

This version

0.0

May 14, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

JSON_file_streaming_GCS_BigQuery-0.0.tar.gz (3.7 kB view details)

Uploaded May 14, 2024 Source

Built Distribution

JSON_file_streaming_GCS_BigQuery-0.0-py3-none-any.whl (4.6 kB view details)

Uploaded May 14, 2024 Python 3

File details

Details for the file JSON_file_streaming_GCS_BigQuery-0.0.tar.gz.

File metadata

Download URL: JSON_file_streaming_GCS_BigQuery-0.0.tar.gz
Upload date: May 14, 2024
Size: 3.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for JSON_file_streaming_GCS_BigQuery-0.0.tar.gz
Algorithm	Hash digest
SHA256	`b43132a9acadf9142cacb03691012905333df676ea492b951a893a28c9844891`
MD5	`7aea888ed07cf26199bb7de18ade5542`
BLAKE2b-256	`157c8dde20ff946a28b18131af9348c9bd079d90f13b7527e392cf5e6918d822`

See more details on using hashes here.

File details

Details for the file JSON_file_streaming_GCS_BigQuery-0.0-py3-none-any.whl.

File metadata

Download URL: JSON_file_streaming_GCS_BigQuery-0.0-py3-none-any.whl
Upload date: May 14, 2024
Size: 4.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for JSON_file_streaming_GCS_BigQuery-0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`40a8f4751b115c82727bfc38c3a9d19a7998782b16189a0c9a34336709e9a99c`
MD5	`a36192f907f6706b58fd1054998663ec`
BLAKE2b-256	`3cfe4c0f238d8fcc9177ccd50304b6abaf9aa104271b7dbe52ae60d334483593`

See more details on using hashes here.

JSON-file-streaming-GCS-BigQuery 0.0

Navigation

Verified details

Maintainers

Unverified details

Project description

Project details

Verified details

Maintainers

Unverified details

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes