Skip to main content

No project description provided

Project description

Python Library for Compressing Files in Google Cloud Storage

This Python library provides a function to convert a CSV file stored in a Google Cloud Storage (GCS) bucket to a Parquet file and upload it back to the same bucket.

Functionality

  1. Downloads a CSV file from a specified GCS bucket.
  2. Reads the CSV data into a pandas DataFrame.
  3. Converts the DataFrame to an Arrow Table for efficient Parquet storage.
  4. Uploads the Arrow Table as a Parquet file to the GCS bucket.
  5. Optionally allows specifying an output folder within the bucket to store the Parquet file.

Installation You'll need the following:

  1. Python 3.6 or later
  2. google-cloud-storage
  3. pandas
  4. pyarrow

Install them using pip:

pip install google-cloud-storage pandas pyarrow

Example Usage:

from gcs_convert_csv_to_parquet import gcs_convert_csv_to_parquet

# Define your parameters
bucket_name = 'your-bucket-name'
csv_file = 'your-csv-file.csv'
parquet_file = 'your-output-file.parquet'
output_folder = 'optional-output-folder'  # Optional

# Convert CSV to Parquet and upload to GCS
gcs_convert_csv_to_parquet(bucket_name, csv_file, parquet_file, output_folder)

The gcs_convert_csv_to_parquet function takes the following arguments:

Parameters:

  1. bucket_name (str): The name of the bucket containing the CSV file.
  2. csv_file (str): The name of the CSV file to convert.
  3. parquet_file (str): The desired name for the output Parquet file.
  4. output_folder (str, optional): The folder within the bucket to store the Parquet file. If None, the file will be stored at the root level of the bucket.

Notes

  1. This library assumes you have proper authentication set up to access Google Cloud Storage.
  2. The CSV file must be valid and well-formatted.
  3. Error handling is included to catch potential exceptions during conversion or upload.

Example Usage This example converts a CSV file named data.csv to a Parquet file named data.parquet and stores it in the output_folder within the specified bucket:

gcs_convert_csv_to_parquet("my-bucket", "data.csv", "data.parque

This will print a success message upon successful conversion and upload.

Project details


Release history Release notifications | RSS feed

This version

0.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcs_convert_csv_to_parquet-0.0.tar.gz (2.7 kB view details)

Uploaded Source

Built Distribution

gcs_convert_csv_to_parquet-0.0-py3-none-any.whl (3.4 kB view details)

Uploaded Python 3

File details

Details for the file gcs_convert_csv_to_parquet-0.0.tar.gz.

File metadata

File hashes

Hashes for gcs_convert_csv_to_parquet-0.0.tar.gz
Algorithm Hash digest
SHA256 0aa2d41fdf416cd625307b21a704ec1e9c70f287a7fd8912b0a1197e2102181c
MD5 cf1a3df9e87b2b941a0ad066f09c512f
BLAKE2b-256 fb0a4bed7dd56459348639f84b874bf94b588ed234d685ddb90d37c311f3eab3

See more details on using hashes here.

File details

Details for the file gcs_convert_csv_to_parquet-0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for gcs_convert_csv_to_parquet-0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a61e19607d0f434e1bea25b54b36647e8aae6918c458aa4627c2c115fbc49347
MD5 b822fa97c3aaafe38d96967b2fbca264
BLAKE2b-256 7de7353738bf97300afcdca185c3aaa1da404dcb071e5bcd670fa109056044d2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page