Archive handling for VCollab applications — extract zip/tar.gz archives and stream directories as zip

Project description

Archive Utilities

Purpose

Archive handling for VCollab applications -- extract ZIP and TAR.GZ archives from binary streams and stream directories as ZIP archives (in-memory or via tempfile).

This package provides two categories of functionality:

Extraction -- Extract ZIP and TAR.GZ archives from any seekable binary stream (BinaryIO) to a target directory, using either in-memory BytesIO (fast, for small files) or temporary file (memory-efficient, for large files) strategies. Includes path traversal protection and configurable size/count limits.
Streaming -- Generate ZIP archives from directory contents on-the-fly for download responses, with memory-based streaming for small directories and tempfile-based streaming for large directories. Supports file filtering/exclusion.

When to use this package

Use vcti-archive when your application needs to:

Accept uploaded ZIP or TAR.GZ files and extract them to disk
Serve directory contents as downloadable ZIP archives
Stream large directory archives without loading everything into memory
Choose between memory-efficient and fast extraction strategies
Protect against malicious archives (path traversal, zip bombs)

Installation

The package has zero required dependencies. FastAPI integration is available as an optional extra.

Without FastAPI (CLI tools, background workers, any framework)

Use this when you only need extraction and streaming — no FastAPI-specific helpers. Works with Django, Flask, plain scripts, or anything that accepts BinaryIO and bytes iterators.

pip install vcti-archive@v1.0.0

What you get:

ZipExtractor, TarGzExtractor — extract from any BinaryIO
DirectoryZipMemoryStreamer — stream directory as ZIP (in-memory)
LargeDirectoryZipStreamer — stream directory as ZIP (tempfile)
Async wrappers, bomb protection, path traversal safety, logging

With FastAPI

Use this when you need streaming_zip_response() — a helper that wraps LargeDirectoryZipStreamer in a StreamingResponse with correct headers and BackgroundTasks cleanup.

pip install "vcti-archive[fastapi]>=1.0.2"

Everything above, plus:

streaming_zip_response(streamer, background_tasks) from vcti.archive.fastapi

In `requirements.txt`

# Without FastAPI
vcti-archive>=1.0.2

# With FastAPI
vcti-archive[fastapi]>=1.0.2

In `pyproject.toml` dependencies

# Without FastAPI
dependencies = [
    "vcti-archive>=1.0.2",
]

# With FastAPI
dependencies = [
    "vcti-archive[fastapi]>=1.0.2",
]

Quick Start

Usage without FastAPI

All core functionality works with any framework or no framework at all. Extractors accept any seekable BinaryIO (open files, io.BytesIO, UploadFile.file, etc.) and streamers yield plain bytes iterators.

Extract a ZIP archive:

from pathlib import Path
from vcti.archive import ZipExtractor

with open("archive.zip", "rb") as f:
    extractor = ZipExtractor(f, Path("/target/dir"))
    extractor.extract_using_bytesio()   # Fast, for small files
    # or
    extractor.extract_using_tempfile()  # Memory-efficient, for large files

# With bomb protection
extractor = ZipExtractor(
    stream, Path("/target"),
    max_total_size=500_000_000,  # 500MB limit
    max_file_count=10_000,       # 10K files limit
)
extractor.extract_using_bytesio()

Extract a TAR.GZ archive:

from pathlib import Path
from vcti.archive import TarGzExtractor

with open("archive.tar.gz", "rb") as f:
    TarGzExtractor(f, Path("/target/dir")).extract_using_bytesio()

Async extraction (non-blocking for async frameworks):

extractor = ZipExtractor(stream, Path("/target/dir"))
await extractor.async_extract_using_bytesio()

Stream a directory as ZIP (in-memory):

from pathlib import Path
from vcti.archive import DirectoryZipMemoryStreamer

streamer = DirectoryZipMemoryStreamer(Path("/data/project"))

# Write to a file
with open("output.zip", "wb") as out:
    for chunk in streamer:
        out.write(chunk)

# Or pass to any framework's streaming response
# Django: StreamingHttpResponse(streamer, content_type="application/zip")
# Flask:  Response(streamer, mimetype="application/zip")

Stream a large directory as ZIP (tempfile):

from pathlib import Path
from vcti.archive import LargeDirectoryZipStreamer

streamer = LargeDirectoryZipStreamer(
    folder_path=Path("/data/project"),
    archive_name="project.zip",
)
for chunk in streamer.stream():
    response.write(chunk)

File filtering (works with both streamers):

streamer = DirectoryZipMemoryStreamer(
    Path("/data/project"),
    exclude=lambda p: p.name.startswith(".") or p.suffix == ".log",
)

Usage with FastAPI

Install with pip install vcti-archive[fastapi]. Everything above still works, plus you get streaming_zip_response() — a helper that wraps LargeDirectoryZipStreamer in a StreamingResponse with correct headers and deferred temp-file cleanup via BackgroundTasks.

Extract an uploaded file:

from pathlib import Path
from fastapi import UploadFile
from vcti.archive import ZipExtractor

@app.post("/upload")
async def upload(file: UploadFile):
    extractor = ZipExtractor(file.file, Path("/data/uploads"))
    await extractor.async_extract_using_bytesio()
    return {"status": "extracted"}

Stream a directory as a download (in-memory, small dirs):

from fastapi.responses import StreamingResponse
from vcti.archive import DirectoryZipMemoryStreamer

@app.get("/download")
def download():
    streamer = DirectoryZipMemoryStreamer(Path("/data/project"))
    return StreamingResponse(streamer, media_type="application/zip")

Stream a large directory as a download (tempfile, large dirs):

from fastapi import BackgroundTasks
from vcti.archive import LargeDirectoryZipStreamer
from vcti.archive.fastapi import streaming_zip_response

@app.get("/download/large")
def download_large(background_tasks: BackgroundTasks):
    streamer = LargeDirectoryZipStreamer(
        folder_path=Path("/data/dataset"),
        archive_name="dataset.zip",
    )
    return streaming_zip_response(streamer, background_tasks)

Choosing a streamer

Both streamers produce identical ZIP output. The difference is where the ZIP is assembled:

DirectoryZipMemoryStreamer -- builds the ZIP in a BytesIO buffer, yielding chunks as it goes. Simple (no temp files, no cleanup), but the buffer stays in memory for the duration of the request.
LargeDirectoryZipStreamer -- writes the complete ZIP to a temp file first, then streams from disk. Needs cleanup (via on_cleanup or the streaming_zip_response helper) but memory usage stays flat regardless of archive size.

The right choice depends on your deployment, not a universal size threshold. Consider:

Process memory budget -- in a 512 MB container, a 200 MB in-memory ZIP may be too large; on a 32 GB server it's trivial.
Concurrent requests -- one 300 MB buffer is fine; fifty concurrent ones may not be.
Disk I/O -- the tempfile streamer writes then reads the full archive, so slow disks add latency that the memory streamer avoids.

When in doubt, start with DirectoryZipMemoryStreamer (less moving parts) and switch to LargeDirectoryZipStreamer if you observe memory pressure under production load.

Public API

Class / Function	Purpose
`ArchiveExtractor`	ABC base class for archive extractors (BytesIO and tempfile strategies)
`ZipExtractor`	Extract ZIP archives with path traversal and bomb protection
`TarGzExtractor`	Extract TAR.GZ archives with `filter="data"` security
`DirectoryZipMemoryStreamer`	Stream directory as ZIP using in-memory buffer (reusable)
`LargeDirectoryZipStreamer`	Stream directory as ZIP using temporary file
`UnsupportedArchiveFormat`	Exception for unsupported archive formats
`streaming_zip_response()`	FastAPI helper (optional, requires `vcti-archive[fastapi]`)

Dependencies

Zero required dependencies -- Core functionality uses Python stdlib only (zipfile, tarfile, shutil, tempfile, asyncio).
Optional: fastapi -- Install with vcti-archive[fastapi] for streaming_zip_response() and FastAPI-specific integration.

Documentation

Design -- Architecture decisions and extraction strategies
Source Guide -- File descriptions and execution flow traces
API Reference -- Autodoc for all modules

Project details

Release history Release notifications | RSS feed

This version

1.0.2

Mar 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcti_archive-1.0.2.tar.gz (23.8 kB view details)

Uploaded Mar 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vcti_archive-1.0.2-py3-none-any.whl (17.2 kB view details)

Uploaded Mar 22, 2026 Python 3

File details

Details for the file vcti_archive-1.0.2.tar.gz.

File metadata

Download URL: vcti_archive-1.0.2.tar.gz
Upload date: Mar 22, 2026
Size: 23.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcti_archive-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`ba4c0265480b9f6629bd3559cd33bbd610cd61ba8e7cfbf7a3abedde570b87ba`
MD5	`f0dd0dc50ca69d906a4d59e19a0bba32`
BLAKE2b-256	`a70b0565227bb390626801e54478bdc0d5316160b7ecae6419ab95c2e58dcc24`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_archive-1.0.2.tar.gz:

Publisher: publish.yml on vcollab/vcti-python-archive

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vcti_archive-1.0.2.tar.gz
- Subject digest: ba4c0265480b9f6629bd3559cd33bbd610cd61ba8e7cfbf7a3abedde570b87ba
- Sigstore transparency entry: 1155582278
- Sigstore integration time: Mar 22, 2026
Source repository:
- Permalink: vcollab/vcti-python-archive@ebeaccaf8714e4b7bd811378a1a9fb1f9c41955e
- Branch / Tag: refs/heads/main
- Owner: https://github.com/vcollab
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ebeaccaf8714e4b7bd811378a1a9fb1f9c41955e
- Trigger Event: workflow_dispatch

File details

Details for the file vcti_archive-1.0.2-py3-none-any.whl.

File metadata

Download URL: vcti_archive-1.0.2-py3-none-any.whl
Upload date: Mar 22, 2026
Size: 17.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcti_archive-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5632662a8695eb5b1150fe2f471a7971ffe4b9387ddfa3efe60fb8bd9f0ca7b`
MD5	`42588659fa2dd9b034c0e89d6e811121`
BLAKE2b-256	`5f034482a71f70d806ef6c3294e7ffa9ae22ac1077b573fe7882238eacf0c1b3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcti_archive-1.0.2-py3-none-any.whl:

Publisher: publish.yml on vcollab/vcti-python-archive

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: vcti_archive-1.0.2-py3-none-any.whl
- Subject digest: c5632662a8695eb5b1150fe2f471a7971ffe4b9387ddfa3efe60fb8bd9f0ca7b
- Sigstore transparency entry: 1155582285
- Sigstore integration time: Mar 22, 2026
Source repository:
- Permalink: vcollab/vcti-python-archive@ebeaccaf8714e4b7bd811378a1a9fb1f9c41955e
- Branch / Tag: refs/heads/main
- Owner: https://github.com/vcollab
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ebeaccaf8714e4b7bd811378a1a9fb1f9c41955e
- Trigger Event: workflow_dispatch

vcti-archive 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Archive Utilities

Purpose

When to use this package

Installation

Without FastAPI (CLI tools, background workers, any framework)

With FastAPI

In `requirements.txt`

In `pyproject.toml` dependencies

Quick Start

Usage without FastAPI

Usage with FastAPI

Choosing a streamer

Public API

Dependencies

Documentation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

vcti-archive 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Archive Utilities

Purpose

When to use this package

Installation

Without FastAPI (CLI tools, background workers, any framework)

With FastAPI

In requirements.txt

In pyproject.toml dependencies

Quick Start

Usage without FastAPI

Usage with FastAPI

Choosing a streamer

Public API

Dependencies

Documentation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

In `requirements.txt`

In `pyproject.toml` dependencies