Skip to main content

Use cloud-stored zipfiles with full ZipFile functionality, including partial downloads.

Project description

cloudzipfile

This module provides a way to access zipfiles in cloud storage without downloading the entire zip file. It is inspired by remotezip, but leverages the respective cloud APIs rather than requiring support for the range header. It currently only supports Azure, porting it to other systems should be fairly simple, pull requests very welcome!

Installation

pip install cloudzipfile

Usage

cloudzipfile is a subclass of Python's standard library zipfile.Zipfile and thus supports all its read methods.

Instead of providing Zipfile with a path, you provide a blob client of your cloud provider, for example:

# Import
from azure.storage.blob import BlobClient
from cloudzipfile.cloudzipfile import CloudZipFile
import os, tempfile, uuid

# Define blob client
BLOB_URL = 'https://cloudzipfileexamples.blob.core.windows.net/test/files.zip'
blobClient = BlobClient.from_blob_url(BLOB_URL)

# Define link to zipfile
# Will download central directory (where to find specific files)
PATH_OUTPUT = os.path.join(tempfile.gettempdir(), str(uuid.uuid4()))
FILES_DESIRED = ['file1.txt', 'file3.txt']
cloudZipFile = CloudZipFile(blobClient)

# Extract specific files
cloudZipFile.extractall(path=PATH_OUTPUT, members=FILES_DESIRED)

# Verify success: should show file1.txt and file2.txt
print(f'{PATH_OUTPUT}: {os.listdir(PATH_OUTPUT)}')

Future Development

Supporting other systems is fairly straightforward as you require only two methods. One that determines the size of the cloud file and one that performs a partial download, these should be supported by all major providers (I simply don't have experience with them).

How It Works

Zip files have a fixed structure, which can be leveraged for partial reading. They end with an EOCD which lists where to find the central directory. The central directory lists all files in the archive and where to find them. Python's zipfile uses these two pieces to determine which part of the file to load into memory when the user requests a particular file. This package overwrites that loading process to work with cloud APIs directly rather than only with local filesystems. All credit go to remotezip for figuring out how to overwrite the process, I only edited it to use APIs rather than HTTP requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloudzipfile-1.0.5.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

cloudzipfile-1.0.5-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file cloudzipfile-1.0.5.tar.gz.

File metadata

  • Download URL: cloudzipfile-1.0.5.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for cloudzipfile-1.0.5.tar.gz
Algorithm Hash digest
SHA256 e508856b509b51fbbdd3f502a04671d8fd15f7e536aad4429d34b45d59f21f6b
MD5 dc78d8c60551431c4ac1f79abd002e5f
BLAKE2b-256 e23848e2e854474c32eb290d51b6258f4a85837c1dc88e9da33246b750acdf08

See more details on using hashes here.

File details

Details for the file cloudzipfile-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for cloudzipfile-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cb415a51d9abf65410fbfc0f103a7d9c1ffbfbb9f723ac0198ab26809259ef88
MD5 ea35dabaa9d15dabbef29c990a73404f
BLAKE2b-256 a9281144a19e4bf510be3f8dc6079ec498e33a459cadb80b20422862f82f6f02

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page