Use cloud-stored zipfiles with full ZipFile functionality, including partial downloads.
Project description
cloudzipfile
This module provides a way to access zipfiles in cloud storage without downloading the entire zip file. It is inspired by remotezip, but leverages the respective cloud APIs rather than requiring support for the range header. It currently only supports Azure, porting it to other systems should be fairly simple, pull requests very welcome!
Installation
pip install cloudzipfile
Usage
cloudzipfile is a subclass of Python's standard library zipfile.Zipfile and thus supports all its read methods.
Instead of providing Zipfile with a path, you provide a blob client of your cloud provider, for example:
# Import
from azure.storage.blob import BlobClient
from cloudzipfile.cloudzipfile import CloudZipFile
import os, tempfile, uuid
# Define blob client
BLOB_URL = 'https://cloudzipfileexamples.blob.core.windows.net/test/files.zip'
blobClient = BlobClient.from_blob_url(BLOB_URL)
# Define link to zipfile
# Will download central directory (where to find specific files)
PATH_OUTPUT = os.path.join(tempfile.gettempdir(), str(uuid.uuid4()))
FILES_DESIRED = ['file1.txt', 'file3.txt']
cloudZipFile = CloudZipFile(blobClient)
# Extract specific files
cloudZipFile.extractall(path=PATH_OUTPUT, members=FILES_DESIRED)
# Verify success: should show file1.txt and file2.txt
print(f'{PATH_OUTPUT}: {os.listdir(PATH_OUTPUT)}')
Future Development
Supporting other systems is fairly straightforward as you require only two methods. One that determines the size of the cloud file and one that performs a partial download, these should be supported by all major providers (I simply don't have experience with them).
How It Works
Zip files have a fixed structure, which can be leveraged for partial reading. They end with an EOCD which lists where to find the central directory. The central directory lists all files in the archive and where to find them. Python's zipfile uses these two pieces to determine which part of the file to load into memory when the user requests a particular file. This package overwrites that loading process to work with cloud APIs directly rather than only with local filesystems. All credit go to remotezip for figuring out how to overwrite the process, I only edited it to use APIs rather than HTTP requests.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cloudzipfile-1.0.5.tar.gz
.
File metadata
- Download URL: cloudzipfile-1.0.5.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.28.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e508856b509b51fbbdd3f502a04671d8fd15f7e536aad4429d34b45d59f21f6b |
|
MD5 | dc78d8c60551431c4ac1f79abd002e5f |
|
BLAKE2b-256 | e23848e2e854474c32eb290d51b6258f4a85837c1dc88e9da33246b750acdf08 |
File details
Details for the file cloudzipfile-1.0.5-py3-none-any.whl
.
File metadata
- Download URL: cloudzipfile-1.0.5-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.28.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb415a51d9abf65410fbfc0f103a7d9c1ffbfbb9f723ac0198ab26809259ef88 |
|
MD5 | ea35dabaa9d15dabbef29c990a73404f |
|
BLAKE2b-256 | a9281144a19e4bf510be3f8dc6079ec498e33a459cadb80b20422862f82f6f02 |