Skip to main content

Creating zip files on the fly

Project description

AioZipStream

This is a fork of ZipStream. Simple python library for streaming ZIP files which are created dynamically, without using any temporary files.

  • No temporary files, data is streamed directly
  • Supported deflate compression method
  • Small memory usage, straming is realised using yield statement
  • Archive structure is created on the fly, and all data can be created during stream
  • Files included into archive can be generated on the fly using Python generators
  • Asynchronous AioZipStream and classic ZipStream are available
  • Zip32 format compatible files
  • Independent from python's standard ZipFile implementation
  • Almost no dependencies: only aiofiles in some circumstances (see AioZipStream section for details)
  • Zip64 support is also planned in future (far future, because I never hitted 4GB file size limit ;-) )

Required Python version:

ZipStream is compatible with Python 2.7.

AioZipStream require Python 3.6. For earlier versions AioZipStream is not available for import.

Usage:

List of files to archive is stored as list of dicts. Why dicts? Because there are possible additional parameters for each file, and more parameters are planned in future.

Sample list of files to archive:

files = [
         # file /tmp/file.dat will be added to archive under `file.dat` name.
         {'file':'/tmp/file.dat'},

         # same file as previous under own name: `completly_different.foo`
         # and will be compressed using `deflate` compression method
         {'file':'/tmp/file.dat',
          'name':'completly_different.foo',
          'compression':'deflate'}
        ]

It's time to stream / archive:

zs = ZipStream(files)
with open("example.zip", "wb") as fout:
    for data in zs.stream():
        fout.write(data)

Any iterable source of binary data can be used in place of regular files. Using generator as input for file must be represented by stream field instead of file, additional name parameter is also required.

def source_of_bytes():
    yield b"123456789"
    yield b"abcdefgh"
    yield b"I am a binary data"

files = [....
         # file will be generated dynamically under name my_data.bin
         {'stream': source_of_bytes(), 'name': 'my_data.bin'},
        ]

Keep in mind, that data should be served in chunks of reasonable size, because in case of using stream, ZipStream class is not able to split data by self.

List of files to stream can be also generated on the fly, during streaming:

import os
from zipstream import ZipStream

def files_to_stream_with_foo_in_name(dirname):
    # all files from selected firectory
    for f in os.listdir(dirname):
        fp = os.path.join(dirname, f)
        if os.path.isfile(fp):
            yield {'file': fp,
                   'name': "foo_" + os.path.basename(fp)}
    # and our generator too
    yield {'stream': source_of_bytes(),
           'name': 'my_data.bin',
           'compression': 'deflate'}

zs = ZipStream(files_to_stream_with_foo_in_name('\tmp\some-files'))

Asynchronous AioZipStream

:warning: To use asynchronous AioZipStream at least Python 3.6 version is required. AioZipStream is using asynchronous generator syntax, wchich is avilable from 3.6 version.

To work with local files addtional aiofiles library is required. If You plan to stream only dynamically generated content, then aiofiles is not required.

See aiofiles github repo for details about aiofiles.

Sample of asynchronous zip streaming

Any generator used to create data on the fly, must be defined as async:

async def content_generator():
    yield b'foo baz'
    asyncio.sleep(0.1) # we simulate little slow source of data
    data = await remote_data_source()
    yield bytes(data, 'utf-8') # always remember to yield binary data
    asyncio.sleep(0.5)
    yield b"the end"

Also zip streaming must be inside async function. Note usage aiofiles.open instead of open, which is asynchronous and will not block event loop during disk access.

from zipstream import AioZipStream

async def zip_async(zipname, files):
    aiozip = AioZipStream(files, chunksize=32768)
    async with aiofiles.open(zipname, mode='wb') as z:
        async for chunk in aiozip.stream():
            await z.write(chunk)

Here is going list of files to send:

files = [
    {'file': '/tmp/car.jpeg'},
    {'file': '/tmp/aaa.mp3', 'name': 'music.mp3'},
    {'stream': content_generator(),
     'name': 'random_stuff.txt'}
]

Start asyncio loop and stream result to file:

loop = asyncio.get_event_loop()
loop.run_until_complete(zip_async('example.zip', files))
loop.stop()

Examples

See examples directory for complete code and working examples of ZipStream and AioZipStream.

Project details


Release history Release notifications | RSS feed

This version

0.4

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiozipstream-0.4.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

aiozipstream-0.4-py2.py3-none-any.whl (8.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file aiozipstream-0.4.tar.gz.

File metadata

  • Download URL: aiozipstream-0.4.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3

File hashes

Hashes for aiozipstream-0.4.tar.gz
Algorithm Hash digest
SHA256 ccc5cec35c2580b8a13185c916b1581bfcb4278ddf6ea3f7f834b6c9c47d6c61
MD5 59cbd77ddc821ee964e9965a350d5e05
BLAKE2b-256 0f1b97b8d72faeb6cd6b44c23fdc45d054b00366ce153896e76e375e63f80a68

See more details on using hashes here.

File details

Details for the file aiozipstream-0.4-py2.py3-none-any.whl.

File metadata

  • Download URL: aiozipstream-0.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.8.3

File hashes

Hashes for aiozipstream-0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a58bad8c75aba319c07bd3d817da7caec7417c1eb4f4c692e00b173fb9ded9c6
MD5 5559b5c356bc81b24768d1810bb89ece
BLAKE2b-256 4332942919e4bc56894416ad78e65d4d06434f388e28ef2740d2176e5db5e010

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page