No project description provided
Project description
Python Library for Compressing Files in Google Cloud Storage
This library provides a function compress_gcs_bucket_files that simplifies the process of compressing multiple CSV files from a Google Cloud Storage (GCS) bucket into a single ZIP archive. The archive is then uploaded back to the same bucket.
1. compress_gcs_bucket_files(bucket_name, prefix, output_zip_name, output_zip_prefix):
The compress_gcs_bucket_files function takes the following arguments:
Parameters:
- bucket_name: (str) The name of the GCS bucket containing the CSV files.
- prefix: (str) The prefix to filter files within the bucket (e.g., "path/to/your/csv/files/"). Include a trailing slash.
- output_zip_name: (str) The desired name for the output ZIP archive.
- output_zip_prefix (str, optional): An optional prefix to add to the filename within the output bucket. Defaults to saving the ZIP file in the same bucket as the source files.
Example Usage:
from compress_gcs_bucket_files import compress_gcs_bucket_files
# Replace with your information
bucket_name = "your-bucket-name"
prefix = "path/to/your/csv/files/" # Include trailing slash
output_zip_name = "compressed_data.zip"
output_zip_prefix = "archived/data/" # Optional, defaults to none
# (saves in the same bucket)
compress_gcs_bucket_files(bucket_name, prefix, output_zip_name, output_zip_prefix)
Function Behavior
- Connects to GCS using storage.Client.
- Retrieves the bucket specified by bucket_name.
- Lists all files with the provided prefix and extension .csv.
- Creates an in-memory buffer (BytesIO) to hold the compressed data.
- Creates a ZipFile object for writing to the buffer using DEFLATED compression.
- Iterates through each file in the bucket: a). Filters for files with extension .csv. b). Downloads the file content as a string. c). Constructs the filename for the ZIP archive: 1. If prefix is provided and the filename starts with the prefix, it removes the prefix from the filename before adding it to the archive. 2. Otherwise, the original filename is used. d). Adds the file content to the ZIP archive with the constructed filename.
- Rewinds the in-memory buffer to the beginning for upload.
- Constructs the final output filename within the GCS bucket, including the output_zip_prefix if provided.
- Creates a new blob object in the GCS bucket with the final filename.
- Uploads the in-memory buffer content (containing the ZIP archive) to the GCS bucket as a ZIP file (content_type='application/zip').
Additional Notes:
- This library uses the google-cloud-storage and zipfile libraries. Ensure they are installed (pip install google-cloud-storage).
- The library only compresses files with the extension .csv.
- The library uses an in-memory buffer for efficiency, avoiding creating temporary files on disk.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Compress_csv_files_gcs_bucket-0.0.tar.gz
.
File metadata
- Download URL: Compress_csv_files_gcs_bucket-0.0.tar.gz
- Upload date:
- Size: 2.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f9775fc6b203ca9a07d97b969d226c19faae97edad6f5757dfc18da6f34c3d6 |
|
MD5 | d8d37d65c3de887923add610ecc2a7bf |
|
BLAKE2b-256 | 609a637156f6ee3ddeb7e9d63a29e007ff023a845b63dccd9404c40f82d64a98 |
File details
Details for the file Compress_csv_files_gcs_bucket-0.0-py3-none-any.whl
.
File metadata
- Download URL: Compress_csv_files_gcs_bucket-0.0-py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a542cc2ad79fa89f1769db81edd351237220b73959abd1ff68b110ff6ce40ff |
|
MD5 | b6d5de2e4d0434c9d20ffad132eaf623 |
|
BLAKE2b-256 | 1fa51f6cdb2931bf21b4f6b9170b0da2fb6dc34eaefd5d416a24a9efe12e30ac |