Skip to main content

Memory-safe archive extraction library with built-in security validation

Project description

exarch

PyPI Python CI License

Memory-safe archive extraction and creation library for Python.

Important: exarch is designed as a secure replacement for vulnerable archive libraries like Python's tarfile, which has known CVEs with CVSS scores up to 9.4.

This package provides Python bindings for exarch-core, a Rust library with built-in protection against common archive vulnerabilities.

Installation

pip install exarch

Tip: Use uv pip install exarch for faster installation.

Alternative Package Managers

# Poetry
poetry add exarch

# Pipenv
pipenv install exarch

Requirements

  • Python >= 3.10

Quick Start

Extraction

import exarch

result = exarch.extract_archive("archive.tar.gz", "/output/path")
print(f"Extracted {result.files_extracted} files")

Creation

import exarch

result = exarch.create_archive("backup.tar.gz", ["src/", "Cargo.toml"])
print(f"Created archive with {result.files_added} files")

Usage

Basic Extraction

import exarch

result = exarch.extract_archive("archive.tar.gz", "/output/path")

print(f"Files extracted: {result.files_extracted}")
print(f"Bytes written: {result.bytes_written}")
print(f"Duration: {result.duration_ms}ms")

With pathlib.Path

from pathlib import Path
import exarch

archive = Path("archive.tar.gz")
output = Path("/output/path")

result = exarch.extract_archive(archive, output)

Custom Security Configuration

import exarch

config = exarch.SecurityConfig()
config = config.max_file_size(100 * 1024 * 1024)  # 100 MB

result = exarch.extract_archive("archive.tar.gz", "/output", config)

Error Handling

import exarch

try:
    result = exarch.extract_archive("archive.tar.gz", "/output")
    print(f"Extracted {result.files_extracted} files")
except exarch.PathTraversalError as e:
    print(f"Blocked path traversal: {e}")
except exarch.ZipBombError as e:
    print(f"Zip bomb detected: {e}")
except exarch.SecurityViolationError as e:
    print(f"Security violation: {e}")
except exarch.ExtractionError as e:
    print(f"Extraction failed: {e}")

API Reference

extract_archive(archive_path, output_dir, config=None)

Extract an archive to the specified directory with security validation.

Parameters:

Name Type Description
archive_path str | Path Path to the archive file
output_dir str | Path Directory where files will be extracted
config SecurityConfig Optional security configuration

Returns: ExtractionReport

Attribute Type Description
files_extracted int Number of files extracted
directories_created int Number of directories created
symlinks_created int Number of symlinks created
bytes_written int Total bytes written
duration_ms int Extraction duration in milliseconds
files_skipped int Number of files skipped (e.g. duplicates)
warnings list[str] Warning messages generated during extraction

Raises:

Exception Description
PathTraversalError Path traversal attempt detected
SymlinkEscapeError Symlink points outside extraction directory
HardlinkEscapeError Hardlink target outside extraction directory
ZipBombError Potential zip bomb detected
QuotaExceededError Resource quota exceeded
SecurityViolationError Security policy violation
UnsupportedFormatError Archive format not supported
InvalidArchiveError Archive is corrupted
IOError I/O operation failed

Note: Since v0.4.0, create_archive raises FileNotFoundError for missing sources, FileExistsError when the output already exists without overwrite, and ValueError for invalid compression levels — matching standard Python conventions.

SecurityConfig

Builder-style security configuration.

config = exarch.SecurityConfig()
config = config.max_file_size(100 * 1024 * 1024)        # 100 MB per file
config = config.max_total_size(1024 * 1024 * 1024)      # 1 GB total
config = config.max_file_count(10_000)                   # Max 10k files
config = config.max_compression_ratio(50.0)              # Zip bomb threshold
config = config.allowed_extensions([".txt", ".md"])      # Extension allowlist
config = config.banned_path_components(["__MACOSX"])     # Skip components
config = config.allow_solid_archives(True)               # Allow solid 7z archives

Security Features

The library provides built-in protection against:

Protection Description
Path traversal Blocks ../ and absolute paths
Symlink attacks Prevents symlinks escaping extraction directory
Hardlink attacks Validates hardlink targets
Zip bombs Detects high compression ratios
Permission sanitization Strips setuid/setgid bits
Size limits Enforces file and total size limits

Caution: Unlike Python's standard tarfile module, exarch applies security validation by default.

Supported Formats

Format Extensions Extract Create List Verify
TAR .tar
TAR+GZIP .tar.gz, .tgz
TAR+BZIP2 .tar.bz2, .tbz2
TAR+XZ .tar.xz, .txz
TAR+ZSTD .tar.zst, .tzst
ZIP .zip
7z .7z

Note: 7z creation is not yet supported. Solid and encrypted 7z archives are rejected for security reasons. Unix symlinks inside 7z archives are reported as regular files (sevenz-rust2 API limitation).

Comparison with tarfile

# UNSAFE - tarfile has known vulnerabilities (CVE-2007-4559)
import tarfile
with tarfile.open("archive.tar.gz") as tar:
    tar.extractall("/output")  # May extract outside target directory!

# SAFE - exarch validates all paths
import exarch
exarch.extract_archive("archive.tar.gz", "/output")  # Protected by default

Development

This package is built using PyO3 and maturin.

# Clone repository
git clone https://github.com/bug-ops/exarch
cd exarch/crates/exarch-python

# Build with maturin
pip install maturin
maturin develop

# Run tests
pytest tests/

Related Packages

License

Licensed under either of:

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

exarch-0.4.0-cp39-abi3-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.9+Windows x86-64

exarch-0.4.0-cp39-abi3-musllinux_1_2_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

exarch-0.4.0-cp39-abi3-musllinux_1_2_aarch64.whl (1.5 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ ARM64

exarch-0.4.0-cp39-abi3-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

exarch-0.4.0-cp39-abi3-manylinux_2_34_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ ARM64

exarch-0.4.0-cp39-abi3-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

exarch-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file exarch-0.4.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: exarch-0.4.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for exarch-0.4.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2ea4014641337c36580fba2de9ea0bab78f5295751bb92cedbeb23a80f9ea1c8
MD5 cc5b580deab6f9ba50aff90236ef7ac0
BLAKE2b-256 cf1c6fb8879f0c821ae2f34469cae9d6e39975df4262cb92e2e3cf18ddbb0a55

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.4.0-cp39-abi3-win_amd64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.4.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for exarch-0.4.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 9f0e7107c395546c58440d420f261b6c8de37405d30324eae03c8f758c9c7f43
MD5 baed8992420bf05743d05f9d2e26e1ad
BLAKE2b-256 30d059d64d809ed28c5aec8393559bdf4571a8b2e2cefc1df05fd69dd26cef6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.4.0-cp39-abi3-musllinux_1_2_x86_64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.4.0-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for exarch-0.4.0-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 a36b5af4c9c4791c9072b108d325f5e14f0ee0de1cc6c23ff0cc8d7a42991930
MD5 4b05a8a1fcf1d2568500220f333fa11e
BLAKE2b-256 aa2353d4130ee6e3cd09e0ca5ff9c67e9ab855b442b4be7302f74f45dc7eeb7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.4.0-cp39-abi3-musllinux_1_2_aarch64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.4.0-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for exarch-0.4.0-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a54485a655b634f9e1cb0bcc72f87bf9deef2c62bb3fc274fdcadf9729d24d90
MD5 df3113e7b1c90e2d453e19f07dc7b63d
BLAKE2b-256 4c45c5229beec0e8dc296569ea1604439e9cba97051ce62a4a3c53fee8fdfe01

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.4.0-cp39-abi3-manylinux_2_34_x86_64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.4.0-cp39-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for exarch-0.4.0-cp39-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 eb2d96a086a7198b5b9a2a67d3f8517f069d0d64e7b09368e59748979d722cfb
MD5 63fce63c9fb0b4ed06e0f7b3cd04dcbd
BLAKE2b-256 f94ea9c58cac0cbe3f971b7adfb446f2caa4e03b257b35267c4ff3e35dd920fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.4.0-cp39-abi3-manylinux_2_34_aarch64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.4.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for exarch-0.4.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fb24b183af14da663926d3fb99ec9f5124d211438b8ee27f39b1761084c0e5e3
MD5 23a039be56ac97f2c75ff59cad21b01d
BLAKE2b-256 cca848e7e57569e12c81dc41c1a7cddb1d461bd0d19428ecc05a32b5d1b9d4c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.4.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for exarch-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0f0fb26c081a2381f1ea2c46f53f409dcd0e99f87d11ec8d4546cbf792f63971
MD5 76710944c9ce94ba7d39dc96b4b6fd7b
BLAKE2b-256 f1c17ac2fc622fbbee6d5e611c85250db271729bbcd7ebe1f65165140b321cd1

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.4.0-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page