Create a streamable and (somewhat) seekable .ZIP file

Project description

Similar systems/projects:

• The Nginx zip module. Only for Nginx, so can’t be used with other webservers.

• python-zipstream. Does not support calculating the file size beforehand or seeing through the file.

Usage:

import zipseeker

# Create an index
fp = zipseeker.ZipSeeker()

# Calculate the total file size, e.g. for the Content-Length HTTP header.
contentLength = fp.size()

# Calculate the last-modified date, e.g. for the Last-Modified HTTP header.
lastModified = fp.lastModified()

# Send the ZIP file to the client
# Optionally add the start and end parameters for range requests.
# Note that the ZIP format doesn't support actually skipping parts of the file,
# as it needs to calculate the CRC-32 of every file at the end of the file.
fp.writeStream(outputFile)

Why?

While the file size of a ZIP file usually can’t be calculated beforehand due to compression, this is actually optional. The headers itself also have a pretty constant size. That means that the whole file can have a predetermined file size (and modtime).

This is useful when you want to provide ZIP downloads of large directories with uncompressable files (e.g. images). The specific use case I created this media file for was to provide downloads of whole photo albums without such inconveniences as requesting a downloading link in an e-mail, using a lot system resources for the creation of temporary files, and having to delete these files afterwards.

Of course, it’s possible to just stream a ZIP file, but that won’t provide any progress indication for file downloads and certainly doesn’t support Range requests.

TODO

• Implement actual seeking in the file - this should be doable.

• Use a CRC-32 cache that can be shared by the calling module.

Project details

Uploaded source