Skip to main content

Stream zip64 archives on the fly.

Project description

ZipFly

forthebadge forthebadge forthebadge

Build Status Build Status Build Status Build Status

Python library to construct a ZIP64 archive on the fly without having to store the entire ZIP in memory or disk. This is useful in memory-constrained environments, or when you would like to start returning compressed data before you've even retrieved all the uncompressed data. Generating ZIPs on-demand in a web server is a typical use case for zipFly.

  • No temporary files, data is streamed directly
  • Support for async interface
  • Calculates archive size before streaming even begins
  • Supports deflate compression method
  • Small memory usage, streaming is done using yield statement
  • Archive structure is created on the fly, and all data can be created during stream
  • Files included into archive can be generated on the fly using Python generators
  • Independent of the goofy 🤮🤮 python's standard ZipFile implementation
  • Only 1 dependency
  • Automatic detection and changing of duplicate names
  • Zip64 format compatible files
  • 21.37% test coverage

This library is based upon this library (this library was a piece of work...)

How to install

pip install zipfly64

https://pypi.org/project/zipFly64

Usage

from zipFly import ZipFly, LocalFile, consts
# compression_method is optional, defaults to consts.NO_COMPRESSION
file1 = LocalFile(file_path='files/lqbfa61deebf1.mp4', compression_method=consts.NO_COMPRESSION) #  or consts.COMPRESSION_DEFLATE 
file2 = LocalFile(file_path='public/2ae9dcd01a3aa.mp4', name="files/my_file2.mp4")  # override the file name
file3 = LocalFile(file_path='files/4shaw1dax4da.mp4', name="my_file3.mp4")  # you control the directory path by specifying it in name

files = [file1, file2, file3]

zipFly = ZipFly(files)

# save to file, or do something else with the stream() generator
with open("out/file.zip", 'wb') as f_out:
    for chunk in zipFly.stream():
        f_out.write(chunk)

Supports dynamically created files

from zipFly import ZipFly, GenFile, LocalFile, consts


def file_generator():
    yield b"uga buga"
    yield b"a29jaGFtIGFsdGVybmF0eXdraQ=="
    yield b"2137"
    
# size is optional, it allows to calculate the total size of the archive before any data is generated
# modification_time in epoch time, defaults to time.time()
file1 = GenFile(name="file.txt", generator=file_generator(), modification_time=time.time(), size=size, compression_method=consts.COMPRESSION_DEFLATE)
file2 = LocalFile(file_path='files/as61aade2ebfd.mp4', compression_method=consts.NO_COMPRESSION) #  or consts.COMPRESSION_DEFLATE 

files = [file1, file2]

zipFly = ZipFly(files)
archive_size = zipFly.calculate_archive_size() # raises ValueError if it can't calculate size

# for example you can set as content length in http response
response['Content-Length'] = archive_size

for chunk in zipFly.stream():
       # do something

Async interface

import asyncio
from zipFly import ZipFly, LocalFile, consts, GenFile
file1 = GenFile(name="file.txt", generator=file_generator())
file2 = LocalFile(file_path='public/2ae9dcd01a3aa.mp4', name="files/my_file2.mp4")

files = [file1, file2]

zipFly = ZipFly(files)

async def save_zip_async():
    with open("out/file.zip", 'wb') as f_out:
        async for chunk in zipFly.async_stream():
            f_out.write(chunk)

asyncio.run(save_zip_async())

[!NOTE]
file_generator must be async. Local file async streaming is done with aiofiles library.

Byte offset mode

[!TIP] Use this with Byte Range header to allow for resumable zip streaming

This mode allows to start generating archive from offset. It finds the file within that offset and starts streaming from it. Sadly it must fetch the entire file as otherwise a correct crc cannot be calculated. If you use LocalFile then it's not a problem as it can very fast go tru the entire local file and calculate crc. However, if u use a GenFile it still has to fetch the entire file with may take a while depending on the file's size.

file1 = GenFile(name="file.txt", generator=file_generator(), crc=crc)
file2 = LocalFile(file_path='public/2ae9dcd01a3aa.mp4', name="files/my_file2.mp4")
files1 = [file1, file2]
zipFly1 = ZipFly(files1)

# Simulating pause/resume
STOP_BYTE = 300
async def async_save_pause():
    byte_offset = 0
    with open("out/file.zip", 'wb') as f_out:
        async for chunk in zipFly1.async_stream():
            remaining_bytes = STOP_BYTE - byte_offset
            if len(chunk) > remaining_bytes:
                chunk = chunk[:remaining_bytes]
            f_out.write(chunk)
            byte_offset += len(chunk)
            if byte_offset >= STOP_BYTE:
                break
                
# Later...                
file3 = GenFile(name="file.txt", generator=file_generator(), crc=crc)
file4 = LocalFile(file_path='public/2ae9dcd01a3aa.mp4', name="files/my_file2.mp4")
files2 = [file3, file4]
zipFly2 = ZipFly(files2)

async def async_save_resume():
    with open("out/file.zip", 'ab') as f_out: # Append mode
        async for chunk in zipFly2.async_stream(byte_offset=STOP_BYTE):
            f_out.write(chunk)

async def pause_resume_save():
    await async_save_pause()
    await async_save_resume()

asyncio.run(pause_resume_save())

If resume ZipFly instance has diffrent files than pause ZipFly instance there will be a corrupted Zip file generated

[!NOTE]
For byte offset mode to work you must use const.NO_COMPRESSION and specify crc for GenFile

[!CAUTION] You mustn't reuse ZipFly instances. They should be re-created everytime you call stream() or async_stream()

[!CAUTION] You mustn't reuse GenFile instances.

Other

Python is not optimized for async I/O operations, thus to speed up the async streaming the chunk_size is changed to 4MB, you can override this by passing chunksize as argument to LocalFile.

I created this library for my iDrive project.

If you have a different use case scenario, and LocalFile and GenFile are not enough, you can extend BaseFile and everything else should work out of the box.

If you extend BaseFile keep in mind that zipFly attempts to "deepcopy" files. It will successfully deepcopy LocalFile, so LocalFile instances can be re-used. However, it will completely skip deep-coping any file instance that has a generator.

Testing

With pytest and pytest-asyncio installed, call pytest from the top-level directory (same as this README.md) to run tests. The 4GB tests are slow. If your machine has enough memory (~4GB free) and a fast disk/SSD, pytest-xdist can speed things up by running tests in parallel. Use it by calling pytest -n auto.

PS

I wholeheartedly hope everyone responsible for creating ZIP documentation gets slaughtered in the most gore and painful way 😊 (in game)

(pls redo ur docs)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zipfly64-1.2.3.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zipfly64-1.2.3-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file zipfly64-1.2.3.tar.gz.

File metadata

  • Download URL: zipfly64-1.2.3.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for zipfly64-1.2.3.tar.gz
Algorithm Hash digest
SHA256 dd2b9fdc993b0cc7ab11394f4416c35c0dd573d7e9f5840cb39f5372f1b798ca
MD5 2a6a231c3b49745694e2e3c8a2903e94
BLAKE2b-256 2a6011c68f3f7e0e99f4fdbc3516ec1f4d31f47038b5384344dbc4918877673e

See more details on using hashes here.

File details

Details for the file zipfly64-1.2.3-py3-none-any.whl.

File metadata

  • Download URL: zipfly64-1.2.3-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for zipfly64-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 eb747a339cc174e31d732b43c85d77af527fa0b06c180a73a11d1fb965dea2fd
MD5 feb2d051cd2a0f4d0a47111fd750e4ce
BLAKE2b-256 8cc6adac31c58e058f3af330e554512d7415ecd08aae2dc94f2e521285f6c7d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page