Skip to main content

No project description provided

Project description

Python Library for Compressing Files in Google Cloud Storage

This library provides a function compress_csv_files_gcs_bucket that simplifies the process of compressing multiple CSV files from a Google Cloud Storage (GCS) bucket into a single ZIP archive. The archive is then uploaded back to the same bucket.

1. compress_csv_files_gcs_bucket(bucket_name, prefix, output_zip_name, output_zip_prefix):

The compress_csv_files_gcs_bucket function takes the following arguments:

Parameters:

  1. bucket_name: (str) The name of the GCS bucket containing the CSV files.
  2. prefix: (str) The prefix to filter files within the bucket (e.g., "path/to/your/csv/files/"). Include a trailing slash.
  3. output_zip_name: (str) The desired name for the output ZIP archive.
  4. output_zip_prefix (str, optional): An optional prefix to add to the filename within the output bucket. Defaults to saving the ZIP file in the same bucket as the source files.

Example Usage:

from Compress_csv_files_gcs_bucket import compress_csv_files_gcs_bucket

# Replace with your information
bucket_name = "your-bucket-name"
prefix = "path/to/your/csv/files/"  # Include trailing slash
output_zip_name = "compressed_data.zip"
output_zip_prefix = "archived/data/"  # Optional, defaults to none
                                       # (saves in the same bucket)

compress_csv_files_gcs_bucket(bucket_name, prefix, output_zip_name, output_zip_prefix)

Function Behavior

  1. Connects to GCS using storage.Client.
  2. Retrieves the bucket specified by bucket_name.
  3. Lists all files with the provided prefix and extension .csv.
  4. Creates an in-memory buffer (BytesIO) to hold the compressed data.
  5. Creates a ZipFile object for writing to the buffer using DEFLATED compression.
  6. Iterates through each file in the bucket: a). Filters for files with extension .csv. b). Downloads the file content as a string. c). Constructs the filename for the ZIP archive: 1. If prefix is provided and the filename starts with the prefix, it removes the prefix from the filename before adding it to the archive. 2. Otherwise, the original filename is used. d). Adds the file content to the ZIP archive with the constructed filename.
  7. Rewinds the in-memory buffer to the beginning for upload.
  8. Constructs the final output filename within the GCS bucket, including the output_zip_prefix if provided.
  9. Creates a new blob object in the GCS bucket with the final filename.
  10. Uploads the in-memory buffer content (containing the ZIP archive) to the GCS bucket as a ZIP file (content_type='application/zip').

Additional Notes:

  1. This library uses the google-cloud-storage and zipfile libraries. Ensure they are installed (pip install google-cloud-storage).
  2. The library only compresses files with the extension .csv.
  3. The library uses an in-memory buffer for efficiency, avoiding creating temporary files on disk.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Compress_csv_files_gcs_bucket-0.1.tar.gz (2.8 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file Compress_csv_files_gcs_bucket-0.1.tar.gz.

File metadata

File hashes

Hashes for Compress_csv_files_gcs_bucket-0.1.tar.gz
Algorithm Hash digest
SHA256 18a8a087eafa14b2336fe316cdddb4e06d2a675471258de86b4ec74b27d9b6be
MD5 f2438835684e0b9c25f01212461eb0d6
BLAKE2b-256 1588395da02ebec3719626ac2410677aefdc2eac405fec42117b37ed0b76d39b

See more details on using hashes here.

File details

Details for the file Compress_csv_files_gcs_bucket-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for Compress_csv_files_gcs_bucket-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c23091e2fba4cfc3b56c11197485980ddb247b306f41ad5eb59cf623119e9c03
MD5 2e2b172a56575e4ca5c92c644f6d906c
BLAKE2b-256 6cf2ff8dea0326442b46197cf57d55966af31344d663669c0dcfe97f824e6050

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page