No project description provided
Project description
This Python library facilitates the migration of column data from Parquet files to BigQuery tables, with the capability to expand the BigQuery table schema by adding missing columns from the Parquet file. It effectively handles scenarios where the BigQuery table schema might not initially contain all columns present in the Parquet data.
Functionality
- Seamless transfer of column data from Parquet files to BigQuery tables.
- Automatic schema expansion in BigQuery by adding missing columns detected in the Parquet file.
- Leverages pandas DataFrames for efficient data manipulation.
- Supports interaction with Google Cloud Storage (GCS) for retrieving Parquet files.
Installation Install the library using pip:
pip install Parquet_Schema_Expansion_Migrator_for_BigQuery
Usage The library provides a function column_transfer_to_bigquery that takes the following arguments:
- bucket_name (str): Name of the GCS bucket containing the Parquet file.
- parquet_file_path (str): Path to the Parquet file within the bucket.
- project_id (str): GCP project ID where the BigQuery dataset resides.
- dataset_id (str): ID of the BigQuery dataset containing the target table.
- table_id (str): ID of the BigQuery table to which data will be transferred.
Example
from Parquet_Schema_Expansion_Migrator_for_BigQuery import Parquet_Schema_Expansion_Migrator_for_BigQuery
bucket_name = "your_bucket_name"
parquet_file_path = "path/to/your/file.parquet"
project_id = "your_project_id"
dataset_id = "your_dataset_id"
table_id = "your_table_id"
Parquet_Schema_Expansion_Migrator_for_BigQuery(bucket_name, parquet_file_path,
project_id, dataset_id, table_id)
Dependencies
The library relies on the following external libraries:
- pandas
- pyarrow
- google-cloud-storage
- google-cloud-bigquery
Ensure these dependencies are installed before using the library.
Additional Notes
- The alter_schema and ExecuteBqQuery functions are assumed to exist and need to be implemented for a complete solution.
- Consider replacing the assumption of STRING data type for missing columns with a more robust logic for data type conversion.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1.tar.gz
.
File metadata
- Download URL: Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 879c83e96940100f9867c2bab3ad9324728186b80b28f5e2482e3a5f06ed2f40 |
|
MD5 | 8b157a76a5db57c2045cba0aef2d5527 |
|
BLAKE2b-256 | 764c5a13d6240d4dfe7ab6c1f0d373ac1c1adf2e70cda17f14fe8802891ee99c |
File details
Details for the file Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1-py3-none-any.whl
.
File metadata
- Download URL: Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66e4ef376816f4cfc9ed215ddb4c72916d51877d8b00623f55a86275a8ae4a06 |
|
MD5 | e812bffb26cde56484cdbf7f8684386d |
|
BLAKE2b-256 | 04068527eed3058550da926f3a989af1c8c8c6c7ec6356bf188fcb5043d6c02e |