Archive handling for VCollab applications — extract zip/tar.gz archives and stream directories as zip
Project description
Archive Utilities
Purpose
Archive handling for VCollab applications -- extract ZIP and TAR.GZ archives from binary streams and stream directories as ZIP archives (in-memory or via tempfile).
This package provides two categories of functionality:
- Extraction -- Extract ZIP and TAR.GZ archives from any seekable
binary stream (
BinaryIO) to a target directory, using either in-memory BytesIO (fast, for small files) or temporary file (memory-efficient, for large files) strategies. Includes path traversal protection and configurable size/count limits. - Streaming -- Generate ZIP archives from directory contents on-the-fly for download responses, with memory-based streaming for small directories and tempfile-based streaming for large directories. Supports file filtering/exclusion.
When to use this package
Use vcti-archive when your application needs to:
- Accept uploaded ZIP or TAR.GZ files and extract them to disk
- Serve directory contents as downloadable ZIP archives
- Stream large directory archives without loading everything into memory
- Choose between memory-efficient and fast extraction strategies
- Protect against malicious archives (path traversal, zip bombs)
Installation
The package has zero required dependencies. FastAPI integration is available as an optional extra.
Without FastAPI (CLI tools, background workers, any framework)
Use this when you only need extraction and streaming — no
FastAPI-specific helpers. Works with Django, Flask, plain scripts,
or anything that accepts BinaryIO and bytes iterators.
pip install vcti-archive@v1.0.0
What you get:
ZipExtractor,TarGzExtractor— extract from anyBinaryIODirectoryZipMemoryStreamer— stream directory as ZIP (in-memory)LargeDirectoryZipStreamer— stream directory as ZIP (tempfile)- Async wrappers, bomb protection, path traversal safety, logging
With FastAPI
Use this when you need streaming_zip_response() — a helper that
wraps LargeDirectoryZipStreamer in a StreamingResponse with
correct headers and BackgroundTasks cleanup.
pip install "vcti-archive[fastapi]>=1.0.2"
Everything above, plus:
streaming_zip_response(streamer, background_tasks)fromvcti.archive.fastapi
In requirements.txt
# Without FastAPI
vcti-archive>=1.0.2
# With FastAPI
vcti-archive[fastapi]>=1.0.2
In pyproject.toml dependencies
# Without FastAPI
dependencies = [
"vcti-archive>=1.0.2",
]
# With FastAPI
dependencies = [
"vcti-archive[fastapi]>=1.0.2",
]
Quick Start
Usage without FastAPI
All core functionality works with any framework or no framework at
all. Extractors accept any seekable BinaryIO (open files,
io.BytesIO, UploadFile.file, etc.) and streamers yield plain
bytes iterators.
Extract a ZIP archive:
from pathlib import Path
from vcti.archive import ZipExtractor
with open("archive.zip", "rb") as f:
extractor = ZipExtractor(f, Path("/target/dir"))
extractor.extract_using_bytesio() # Fast, for small files
# or
extractor.extract_using_tempfile() # Memory-efficient, for large files
# With bomb protection
extractor = ZipExtractor(
stream, Path("/target"),
max_total_size=500_000_000, # 500MB limit
max_file_count=10_000, # 10K files limit
)
extractor.extract_using_bytesio()
Extract a TAR.GZ archive:
from pathlib import Path
from vcti.archive import TarGzExtractor
with open("archive.tar.gz", "rb") as f:
TarGzExtractor(f, Path("/target/dir")).extract_using_bytesio()
Async extraction (non-blocking for async frameworks):
extractor = ZipExtractor(stream, Path("/target/dir"))
await extractor.async_extract_using_bytesio()
Stream a directory as ZIP (in-memory):
from pathlib import Path
from vcti.archive import DirectoryZipMemoryStreamer
streamer = DirectoryZipMemoryStreamer(Path("/data/project"))
# Write to a file
with open("output.zip", "wb") as out:
for chunk in streamer:
out.write(chunk)
# Or pass to any framework's streaming response
# Django: StreamingHttpResponse(streamer, content_type="application/zip")
# Flask: Response(streamer, mimetype="application/zip")
Stream a large directory as ZIP (tempfile):
from pathlib import Path
from vcti.archive import LargeDirectoryZipStreamer
streamer = LargeDirectoryZipStreamer(
folder_path=Path("/data/project"),
archive_name="project.zip",
)
for chunk in streamer.stream():
response.write(chunk)
File filtering (works with both streamers):
streamer = DirectoryZipMemoryStreamer(
Path("/data/project"),
exclude=lambda p: p.name.startswith(".") or p.suffix == ".log",
)
Usage with FastAPI
Install with pip install vcti-archive[fastapi]. Everything above
still works, plus you get streaming_zip_response() — a helper that
wraps LargeDirectoryZipStreamer in a StreamingResponse with
correct headers and deferred temp-file cleanup via BackgroundTasks.
Extract an uploaded file:
from pathlib import Path
from fastapi import UploadFile
from vcti.archive import ZipExtractor
@app.post("/upload")
async def upload(file: UploadFile):
extractor = ZipExtractor(file.file, Path("/data/uploads"))
await extractor.async_extract_using_bytesio()
return {"status": "extracted"}
Stream a directory as a download (in-memory, small dirs):
from fastapi.responses import StreamingResponse
from vcti.archive import DirectoryZipMemoryStreamer
@app.get("/download")
def download():
streamer = DirectoryZipMemoryStreamer(Path("/data/project"))
return StreamingResponse(streamer, media_type="application/zip")
Stream a large directory as a download (tempfile, large dirs):
from fastapi import BackgroundTasks
from vcti.archive import LargeDirectoryZipStreamer
from vcti.archive.fastapi import streaming_zip_response
@app.get("/download/large")
def download_large(background_tasks: BackgroundTasks):
streamer = LargeDirectoryZipStreamer(
folder_path=Path("/data/dataset"),
archive_name="dataset.zip",
)
return streaming_zip_response(streamer, background_tasks)
Choosing a streamer
Both streamers produce identical ZIP output. The difference is where the ZIP is assembled:
DirectoryZipMemoryStreamer-- builds the ZIP in aBytesIObuffer, yielding chunks as it goes. Simple (no temp files, no cleanup), but the buffer stays in memory for the duration of the request.LargeDirectoryZipStreamer-- writes the complete ZIP to a temp file first, then streams from disk. Needs cleanup (viaon_cleanupor thestreaming_zip_responsehelper) but memory usage stays flat regardless of archive size.
The right choice depends on your deployment, not a universal size threshold. Consider:
- Process memory budget -- in a 512 MB container, a 200 MB in-memory ZIP may be too large; on a 32 GB server it's trivial.
- Concurrent requests -- one 300 MB buffer is fine; fifty concurrent ones may not be.
- Disk I/O -- the tempfile streamer writes then reads the full archive, so slow disks add latency that the memory streamer avoids.
When in doubt, start with DirectoryZipMemoryStreamer (less moving
parts) and switch to LargeDirectoryZipStreamer if you observe
memory pressure under production load.
Public API
| Class / Function | Purpose |
|---|---|
ArchiveExtractor |
ABC base class for archive extractors (BytesIO and tempfile strategies) |
ZipExtractor |
Extract ZIP archives with path traversal and bomb protection |
TarGzExtractor |
Extract TAR.GZ archives with filter="data" security |
DirectoryZipMemoryStreamer |
Stream directory as ZIP using in-memory buffer (reusable) |
LargeDirectoryZipStreamer |
Stream directory as ZIP using temporary file |
UnsupportedArchiveFormat |
Exception for unsupported archive formats |
streaming_zip_response() |
FastAPI helper (optional, requires vcti-archive[fastapi]) |
Dependencies
- Zero required dependencies -- Core functionality uses Python
stdlib only (
zipfile,tarfile,shutil,tempfile,asyncio). - Optional:
fastapi-- Install withvcti-archive[fastapi]forstreaming_zip_response()and FastAPI-specific integration.
Documentation
- Design -- Architecture decisions and extraction strategies
- Source Guide -- File descriptions and execution flow traces
- API Reference -- Autodoc for all modules
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vcti_archive-1.0.2.tar.gz.
File metadata
- Download URL: vcti_archive-1.0.2.tar.gz
- Upload date:
- Size: 23.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba4c0265480b9f6629bd3559cd33bbd610cd61ba8e7cfbf7a3abedde570b87ba
|
|
| MD5 |
f0dd0dc50ca69d906a4d59e19a0bba32
|
|
| BLAKE2b-256 |
a70b0565227bb390626801e54478bdc0d5316160b7ecae6419ab95c2e58dcc24
|
Provenance
The following attestation bundles were made for vcti_archive-1.0.2.tar.gz:
Publisher:
publish.yml on vcollab/vcti-python-archive
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcti_archive-1.0.2.tar.gz -
Subject digest:
ba4c0265480b9f6629bd3559cd33bbd610cd61ba8e7cfbf7a3abedde570b87ba - Sigstore transparency entry: 1155582278
- Sigstore integration time:
-
Permalink:
vcollab/vcti-python-archive@ebeaccaf8714e4b7bd811378a1a9fb1f9c41955e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/vcollab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ebeaccaf8714e4b7bd811378a1a9fb1f9c41955e -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file vcti_archive-1.0.2-py3-none-any.whl.
File metadata
- Download URL: vcti_archive-1.0.2-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5632662a8695eb5b1150fe2f471a7971ffe4b9387ddfa3efe60fb8bd9f0ca7b
|
|
| MD5 |
42588659fa2dd9b034c0e89d6e811121
|
|
| BLAKE2b-256 |
5f034482a71f70d806ef6c3294e7ffa9ae22ac1077b573fe7882238eacf0c1b3
|
Provenance
The following attestation bundles were made for vcti_archive-1.0.2-py3-none-any.whl:
Publisher:
publish.yml on vcollab/vcti-python-archive
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcti_archive-1.0.2-py3-none-any.whl -
Subject digest:
c5632662a8695eb5b1150fe2f471a7971ffe4b9387ddfa3efe60fb8bd9f0ca7b - Sigstore transparency entry: 1155582285
- Sigstore integration time:
-
Permalink:
vcollab/vcti-python-archive@ebeaccaf8714e4b7bd811378a1a9fb1f9c41955e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/vcollab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ebeaccaf8714e4b7bd811378a1a9fb1f9c41955e -
Trigger Event:
workflow_dispatch
-
Statement type: