Parquet-Schema-Expansion-Migrator-for-BigQuery

No project description provided

Project description

This Python library facilitates the migration of column data from Parquet files to BigQuery tables, with the capability to expand the BigQuery table schema by adding missing columns from the Parquet file. It effectively handles scenarios where the BigQuery table schema might not initially contain all columns present in the Parquet data.

Functionality

Seamless transfer of column data from Parquet files to BigQuery tables.
Automatic schema expansion in BigQuery by adding missing columns detected in the Parquet file.
Leverages pandas DataFrames for efficient data manipulation.
Supports interaction with Google Cloud Storage (GCS) for retrieving Parquet files.

Installation Install the library using pip:

pip install Parquet_Schema_Expansion_Migrator_for_BigQuery

Usage The library provides a function column_transfer_to_bigquery that takes the following arguments:

bucket_name (str): Name of the GCS bucket containing the Parquet file.
parquet_file_path (str): Path to the Parquet file within the bucket.
project_id (str): GCP project ID where the BigQuery dataset resides.
dataset_id (str): ID of the BigQuery dataset containing the target table.
table_id (str): ID of the BigQuery table to which data will be transferred.

Example

from Parquet_Schema_Expansion_Migrator_for_BigQuery import Parquet_Schema_Expansion_Migrator_for_BigQuery

bucket_name = "your_bucket_name"
parquet_file_path = "path/to/your/file.parquet"
project_id = "your_project_id"
dataset_id = "your_dataset_id"
table_id = "your_table_id"

Parquet_Schema_Expansion_Migrator_for_BigQuery(bucket_name, parquet_file_path,
project_id, dataset_id, table_id)

Dependencies

The library relies on the following external libraries:

pandas
pyarrow
google-cloud-storage
google-cloud-bigquery

Ensure these dependencies are installed before using the library.

Additional Notes

The alter_schema and ExecuteBqQuery functions are assumed to exist and need to be implemented for a complete solution.
Consider replacing the assumption of STRING data type for missing columns with a more robust logic for data type conversion.

Project details

Release history Release notifications | RSS feed

This version

0.1

May 15, 2024

0.0

May 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1.tar.gz (2.8 kB view details)

Uploaded May 15, 2024 Source

Built Distribution

Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1-py3-none-any.whl (3.8 kB view details)

Uploaded May 15, 2024 Python 3

File details

Details for the file Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1.tar.gz.

File metadata

Download URL: Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1.tar.gz
Upload date: May 15, 2024
Size: 2.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1.tar.gz
Algorithm	Hash digest
SHA256	`879c83e96940100f9867c2bab3ad9324728186b80b28f5e2482e3a5f06ed2f40`
MD5	`8b157a76a5db57c2045cba0aef2d5527`
BLAKE2b-256	`764c5a13d6240d4dfe7ab6c1f0d373ac1c1adf2e70cda17f14fe8802891ee99c`

See more details on using hashes here.

File details

Details for the file Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1-py3-none-any.whl.

File metadata

Download URL: Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1-py3-none-any.whl
Upload date: May 15, 2024
Size: 3.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`66e4ef376816f4cfc9ed215ddb4c72916d51877d8b00623f55a86275a8ae4a06`
MD5	`e812bffb26cde56484cdbf7f8684386d`
BLAKE2b-256	`04068527eed3058550da926f3a989af1c8c8c6c7ec6356bf188fcb5043d6c02e`