No project description provided
Project description
Python Library for Compressing Files in Google Cloud Storage
This Python library provides a function to convert a CSV file stored in a Google Cloud Storage (GCS) bucket to a Parquet file and upload it back to the same bucket.
Functionality
- Downloads a CSV file from a specified GCS bucket.
- Reads the CSV data into a pandas DataFrame.
- Converts the DataFrame to an Arrow Table for efficient Parquet storage.
- Uploads the Arrow Table as a Parquet file to the GCS bucket.
- Optionally allows specifying an output folder within the bucket to store the Parquet file.
Installation You'll need the following:
- Python 3.6 or later
- google-cloud-storage
- pandas
- pyarrow
Install them using pip:
pip install google-cloud-storage pandas pyarrow
Example Usage:
from gcs_convert_csv_to_parquet import gcs_convert_csv_to_parquet
# Define your parameters
bucket_name = 'your-bucket-name'
csv_file = 'your-csv-file.csv'
parquet_file = 'your-output-file.parquet'
output_folder = 'optional-output-folder' # Optional
# Convert CSV to Parquet and upload to GCS
gcs_convert_csv_to_parquet(bucket_name, csv_file, parquet_file, output_folder)
The gcs_convert_csv_to_parquet function takes the following arguments:
Parameters:
- bucket_name (str): The name of the bucket containing the CSV file.
- csv_file (str): The name of the CSV file to convert.
- parquet_file (str): The desired name for the output Parquet file.
- output_folder (str, optional): The folder within the bucket to store the Parquet file. If None, the file will be stored at the root level of the bucket.
Notes
- This library assumes you have proper authentication set up to access Google Cloud Storage.
- The CSV file must be valid and well-formatted.
- Error handling is included to catch potential exceptions during conversion or upload.
Example Usage This example converts a CSV file named data.csv to a Parquet file named data.parquet and stores it in the output_folder within the specified bucket:
gcs_convert_csv_to_parquet("my-bucket", "data.csv", "data.parque
This will print a success message upon successful conversion and upload.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gcs_convert_csv_to_parquet-0.0.tar.gz
.
File metadata
- Download URL: gcs_convert_csv_to_parquet-0.0.tar.gz
- Upload date:
- Size: 2.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0aa2d41fdf416cd625307b21a704ec1e9c70f287a7fd8912b0a1197e2102181c |
|
MD5 | cf1a3df9e87b2b941a0ad066f09c512f |
|
BLAKE2b-256 | fb0a4bed7dd56459348639f84b874bf94b588ed234d685ddb90d37c311f3eab3 |
File details
Details for the file gcs_convert_csv_to_parquet-0.0-py3-none-any.whl
.
File metadata
- Download URL: gcs_convert_csv_to_parquet-0.0-py3-none-any.whl
- Upload date:
- Size: 3.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a61e19607d0f434e1bea25b54b36647e8aae6918c458aa4627c2c115fbc49347 |
|
MD5 | b822fa97c3aaafe38d96967b2fbca264 |
|
BLAKE2b-256 | 7de7353738bf97300afcdca185c3aaa1da404dcb071e5bcd670fa109056044d2 |