Skip to main content

No project description provided

Project description

This Python library facilitates the migration of column data from Parquet files to BigQuery tables, with the capability to expand the BigQuery table schema by adding missing columns from the Parquet file. It effectively handles scenarios where the BigQuery table schema might not initially contain all columns present in the Parquet data.

Functionality ● Seamless transfer of column data from Parquet files to BigQuery tables. ● Automatic schema expansion in BigQuery by adding missing columns detected in the Parquet file. ● Leverages pandas DataFrames for efficient data manipulation. ● Supports interaction with Google Cloud Storage (GCS) for retrieving Parquet files.

Installation Install the library using pip:

pip install Parquet_Schema_Expansion_Migrator_for_BigQuery

Usage The library provides a function column_transfer_to_bigquery that takes the following arguments: ● bucket_name (str): Name of the GCS bucket containing the Parquet file. ● parquet_file_path (str): Path to the Parquet file within the bucket. ● project_id (str): GCP project ID where the BigQuery dataset resides. ● dataset_id (str): ID of the BigQuery dataset containing the target table. ● table_id (str): ID of the BigQuery table to which data will be transferred.

Example

from Parquet_Schema_Expansion_Migrator_for_BigQuery import Parquet_Schema_Expansion_Migrator_for_BigQuery

bucket_name = "your_bucket_name"
parquet_file_path = "path/to/your/file.parquet"
project_id = "your_project_id"
dataset_id = "your_dataset_id"
table_id = "your_table_id"

Parquet_Schema_Expansion_Migrator_for_BigQuery(bucket_name, parquet_file_path,
project_id, dataset_id, table_id)

Dependencies The library relies on the following external libraries: ● pandas ● pyarrow ● google-cloud-storage ● google-cloud-bigquery Ensure these dependencies are installed before using the library.

Additional Notes ● The alter_schema and ExecuteBqQuery functions are assumed to exist and need to be implemented for a complete solution. ● Consider replacing the assumption of STRING data type for missing columns with a more robust logic for data type conversion.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file Parquet_Schema_Expansion_Migrator_for_BigQuery-0.0.tar.gz.

File metadata

File hashes

Hashes for Parquet_Schema_Expansion_Migrator_for_BigQuery-0.0.tar.gz
Algorithm Hash digest
SHA256 01ff9439f58d27a3449ebba318e5c4c73c9b97e9552d04040ef4c5254eedf1f9
MD5 ac59a65e646c32d5e5929e1ddf9ddafd
BLAKE2b-256 7d90eab8bbd1ab8db39b2bb90134b7e6fd4b77e34888c999f2e61e4fa9d636d1

See more details on using hashes here.

File details

Details for the file Parquet_Schema_Expansion_Migrator_for_BigQuery-0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for Parquet_Schema_Expansion_Migrator_for_BigQuery-0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 602073e9329c69a5ba65e8d6688c76ecc13e142c01d005d997a233f787f5a1ad
MD5 7000e0a5ae25b1582547457c0b974b47
BLAKE2b-256 bea2c24197c0823caa23742901813ac358c3def9d7da243e61afcaa09aba3de9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page