Skip to main content

No project description provided

Project description

This Python library facilitates the migration of column data from Parquet files to BigQuery tables, with the capability to expand the BigQuery table schema by adding missing columns from the Parquet file. It effectively handles scenarios where the BigQuery table schema might not initially contain all columns present in the Parquet data.

Functionality

  1. Seamless transfer of column data from Parquet files to BigQuery tables.
  2. Automatic schema expansion in BigQuery by adding missing columns detected in the Parquet file.
  3. Leverages pandas DataFrames for efficient data manipulation.
  4. Supports interaction with Google Cloud Storage (GCS) for retrieving Parquet files.

Installation Install the library using pip:

pip install Parquet_Schema_Expansion_Migrator_for_BigQuery

Usage The library provides a function column_transfer_to_bigquery that takes the following arguments:

  1. bucket_name (str): Name of the GCS bucket containing the Parquet file.
  2. parquet_file_path (str): Path to the Parquet file within the bucket.
  3. project_id (str): GCP project ID where the BigQuery dataset resides.
  4. dataset_id (str): ID of the BigQuery dataset containing the target table.
  5. table_id (str): ID of the BigQuery table to which data will be transferred.

Example

from Parquet_Schema_Expansion_Migrator_for_BigQuery import Parquet_Schema_Expansion_Migrator_for_BigQuery

bucket_name = "your_bucket_name"
parquet_file_path = "path/to/your/file.parquet"
project_id = "your_project_id"
dataset_id = "your_dataset_id"
table_id = "your_table_id"

Parquet_Schema_Expansion_Migrator_for_BigQuery(bucket_name, parquet_file_path,
project_id, dataset_id, table_id)

Dependencies

The library relies on the following external libraries:

  1. pandas
  2. pyarrow
  3. google-cloud-storage
  4. google-cloud-bigquery

Ensure these dependencies are installed before using the library.

Additional Notes

  1. The alter_schema and ExecuteBqQuery functions are assumed to exist and need to be implemented for a complete solution.
  2. Consider replacing the assumption of STRING data type for missing columns with a more robust logic for data type conversion.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1.tar.gz.

File metadata

File hashes

Hashes for Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1.tar.gz
Algorithm Hash digest
SHA256 879c83e96940100f9867c2bab3ad9324728186b80b28f5e2482e3a5f06ed2f40
MD5 8b157a76a5db57c2045cba0aef2d5527
BLAKE2b-256 764c5a13d6240d4dfe7ab6c1f0d373ac1c1adf2e70cda17f14fe8802891ee99c

See more details on using hashes here.

File details

Details for the file Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for Parquet_Schema_Expansion_Migrator_for_BigQuery-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 66e4ef376816f4cfc9ed215ddb4c72916d51877d8b00623f55a86275a8ae4a06
MD5 e812bffb26cde56484cdbf7f8684386d
BLAKE2b-256 04068527eed3058550da926f3a989af1c8c8c6c7ec6356bf188fcb5043d6c02e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page