Skip to main content

Asynchronous gzip file reader/writer with aiocsv support.

Project description

aiogzip ⚡️

An asynchronous library for reading and writing gzip-compressed files.

License: MIT PyPI version Python 3.8-3.14 Tests Coverage Documentation

aiogzip provides a fast, simple, and asyncio-native interface for handling .gz files, making it a useful complement to Python's built-in gzip module for asynchronous applications.

🚀 Read the Documentation

Features

  • Truly Asynchronous: Built with asyncio and aiofiles.
  • High-Performance: Optimized buffer handling for fast I/O.
  • Drop-in Replacement: Mimics gzip.open() with async seek, tell, peek, and readinto support; verified against tarfile-style access patterns and aiocsv workflows.
  • Reproducible Archives: Control gzip mtime and embedded filenames.
  • Type-Safe: Distinct AsyncGzipBinaryFile and AsyncGzipTextFile.
  • aiocsv Ready: Seamless integration for CSV pipelines.
  • Predictable Performance: Backward seeks rewind the stream and re-decompress data (same as gzip.GzipFile), so treat random access as O(n) and prefer forward-only patterns when possible.

Append mode and large files

  • Append mode ("ab", "at") writes a new gzip member. The file ends up as two (or more) concatenated gzip members. Every standards-compliant reader — including aiogzip, gzip.open(), and command-line gunzip — transparently concatenates the output, but each additional open writes a new member rather than extending the existing deflate stream.
  • Backward seeks restart decompression from the beginning of the file, so forward-only access is much faster than mixed-direction access.
  • Non-seekable input streams use a bounded rewind cache. By default, up to 128 MiB of compressed input is retained so backward seeks can replay the stream; pass max_rewind_cache_size=<bytes> to tune this, or None to allow an unbounded cache.
  • Writes past 4 GiB of uncompressed data produce a gzip trailer whose ISIZE field wraps to size & 0xFFFFFFFF (this matches the gzip format spec and gzip.open()). Pass strict_size=True to refuse writes that would exceed the limit instead.
  • Guard against decompression bombs by passing max_decompressed_size=<bytes> when reading untrusted files; the decompressor aborts with OSError once the cap is exceeded.

Quickstart

pip install aiogzip
import asyncio
from aiogzip import AsyncGzipFile

async def main():
    # Write
    async with AsyncGzipFile("file.gz", "wb") as f:
        await f.write(b"Hello, async world!")

    # Read
    async with AsyncGzipFile("file.gz", "rb") as f:
        print(await f.read())

asyncio.run(main())

# Deterministic metadata
async with AsyncGzipFile(
    "dataset.gz", "wb", mtime=0, original_filename="dataset.csv"
) as f:
    await f.write(b"stable bytes")

Performance

  • Text I/O: Often ~2-3x faster than standard gzip in bulk text workflows.
  • Binary I/O: Typically near parity for bulk reads/writes, and can be slower for very small chunk sizes.
  • Concurrency: CPU-heavy zlib compress/decompress calls run in the default executor above a 256 KiB threshold, so multiple gzip streams on the same event loop compress and decompress in parallel instead of serializing on the loop thread. The repo's concurrent-I/O benchmark runs ~4x faster on 1.4.0 than on 1.3.x as a result; single-stream throughput stays at parity.
  • Memory: Optimized buffer management for stable memory usage.
  • JSONL: For large gzipped JSONL files, prefer AsyncGzipTextFile(..., newline="\n", chunk_size=512 * 1024) to reduce line-iteration overhead.

See the Performance Guide for detailed benchmarks.

Contributing

See CONTRIBUTING.md for development instructions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiogzip-1.5.0.tar.gz (58.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aiogzip-1.5.0-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file aiogzip-1.5.0.tar.gz.

File metadata

  • Download URL: aiogzip-1.5.0.tar.gz
  • Upload date:
  • Size: 58.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aiogzip-1.5.0.tar.gz
Algorithm Hash digest
SHA256 25844e7628fb6f69c6579de80ca65d4444d9258554c4628907414df3b54eaaef
MD5 26f830b7084bdbf2d81bd6dea4d9e90f
BLAKE2b-256 3780a9b3e3f6443032904f6d0772d3f2e5b2da4dc0f8b12867c9fe2b3948df37

See more details on using hashes here.

Provenance

The following attestation bundles were made for aiogzip-1.5.0.tar.gz:

Publisher: publish.yml on geoff-davis/aiogzip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aiogzip-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: aiogzip-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 26.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aiogzip-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 150a65f7a69dfa54c623dcfba3cda244e4040a3a6ccd69b2448818a84967e934
MD5 5363b61bbd77a3c0ddfff5c40822039f
BLAKE2b-256 2d42529116321f52f148be4e64ab8a8341178b047034e6154e04389939d17c5b

See more details on using hashes here.

Provenance

The following attestation bundles were made for aiogzip-1.5.0-py3-none-any.whl:

Publisher: publish.yml on geoff-davis/aiogzip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page