Skip to main content

Memory-safe archive extraction library with built-in security validation

Project description

exarch

PyPI Python CI License

Memory-safe archive extraction and creation library for Python.

Important: exarch is designed as a secure replacement for vulnerable archive libraries like Python's tarfile, which has known CVEs with CVSS scores up to 9.4.

This package provides Python bindings for exarch-core, a Rust library with built-in protection against common archive vulnerabilities.

Installation

pip install exarch

Tip: Use uv pip install exarch for faster installation.

Alternative Package Managers

# Poetry
poetry add exarch

# Pipenv
pipenv install exarch

Requirements

  • Python >= 3.9

Quick Start

Extraction

import exarch

result = exarch.extract_archive("archive.tar.gz", "/output/path")
print(f"Extracted {result.files_extracted} files")

Creation

import exarch

result = exarch.create_archive("backup.tar.gz", ["src/", "Cargo.toml"])
print(f"Created archive with {result.files_added} files")

Usage

Basic Extraction

import exarch

result = exarch.extract_archive("archive.tar.gz", "/output/path")

print(f"Files extracted: {result.files_extracted}")
print(f"Bytes written: {result.bytes_written}")
print(f"Duration: {result.duration_ms}ms")

With pathlib.Path

from pathlib import Path
import exarch

archive = Path("archive.tar.gz")
output = Path("/output/path")

result = exarch.extract_archive(archive, output)

Custom Security Configuration

import exarch

config = exarch.SecurityConfig()
config = config.max_file_size(100 * 1024 * 1024)  # 100 MB

result = exarch.extract_archive("archive.tar.gz", "/output", config)

Error Handling

import exarch

try:
    result = exarch.extract_archive("archive.tar.gz", "/output")
    print(f"Extracted {result.files_extracted} files")
except exarch.PathTraversalError as e:
    print(f"Blocked path traversal: {e}")
except exarch.ZipBombError as e:
    print(f"Zip bomb detected: {e}")
except exarch.SecurityViolationError as e:
    print(f"Security violation: {e}")
except exarch.ExtractionError as e:
    print(f"Extraction failed: {e}")

API Reference

extract_archive(archive_path, output_dir, config=None)

Extract an archive to the specified directory with security validation.

Parameters:

Name Type Description
archive_path str | Path Path to the archive file
output_dir str | Path Directory where files will be extracted
config SecurityConfig Optional security configuration

Returns: ExtractionReport

Attribute Type Description
files_extracted int Number of files extracted
directories_created int Number of directories created
symlinks_created int Number of symlinks created
bytes_written int Total bytes written
duration_ms int Extraction duration in milliseconds
files_skipped int Number of files skipped (e.g. duplicates)
warnings list[str] Warning messages generated during extraction

Raises:

Exception Description
PathTraversalError Path traversal attempt detected
SymlinkEscapeError Symlink points outside extraction directory
HardlinkEscapeError Hardlink target outside extraction directory
ZipBombError Potential zip bomb detected
QuotaExceededError Resource quota exceeded
SecurityViolationError Security policy violation
UnsupportedFormatError Archive format not supported
InvalidArchiveError Archive is corrupted
IOError I/O operation failed

SecurityConfig

Builder-style security configuration.

config = exarch.SecurityConfig()
config = config.max_file_size(100 * 1024 * 1024)    # 100 MB per file
config = config.max_total_size(1024 * 1024 * 1024)  # 1 GB total
config = config.max_file_count(10_000)               # Max 10k files
config = config.allow_solid_archives(True)           # Allow solid 7z archives

Security Features

The library provides built-in protection against:

Protection Description
Path traversal Blocks ../ and absolute paths
Symlink attacks Prevents symlinks escaping extraction directory
Hardlink attacks Validates hardlink targets
Zip bombs Detects high compression ratios
Permission sanitization Strips setuid/setgid bits
Size limits Enforces file and total size limits

Caution: Unlike Python's standard tarfile module, exarch applies security validation by default.

Supported Formats

Format Extensions Extract Create List Verify
TAR .tar
TAR+GZIP .tar.gz, .tgz
TAR+BZIP2 .tar.bz2, .tbz2
TAR+XZ .tar.xz, .txz
TAR+ZSTD .tar.zst, .tzst
ZIP .zip
7z .7z

Note: 7z creation is not yet supported. Solid and encrypted 7z archives are rejected for security reasons. Unix symlinks inside 7z archives are reported as regular files (sevenz-rust2 API limitation).

Comparison with tarfile

# UNSAFE - tarfile has known vulnerabilities (CVE-2007-4559)
import tarfile
with tarfile.open("archive.tar.gz") as tar:
    tar.extractall("/output")  # May extract outside target directory!

# SAFE - exarch validates all paths
import exarch
exarch.extract_archive("archive.tar.gz", "/output")  # Protected by default

Development

This package is built using PyO3 and maturin.

# Clone repository
git clone https://github.com/bug-ops/exarch
cd exarch/crates/exarch-python

# Build with maturin
pip install maturin
maturin develop

# Run tests
pytest tests/

Related Packages

License

Licensed under either of:

at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

exarch-0.3.0-cp39-abi3-win_amd64.whl (1.1 MB view details)

Uploaded CPython 3.9+Windows x86-64

exarch-0.3.0-cp39-abi3-musllinux_1_2_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ x86-64

exarch-0.3.0-cp39-abi3-musllinux_1_2_aarch64.whl (1.5 MB view details)

Uploaded CPython 3.9+musllinux: musl 1.2+ ARM64

exarch-0.3.0-cp39-abi3-manylinux_2_34_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

exarch-0.3.0-cp39-abi3-manylinux_2_34_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ ARM64

exarch-0.3.0-cp39-abi3-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

exarch-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file exarch-0.3.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: exarch-0.3.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.1 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for exarch-0.3.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2c9ff1c56780cae9594733e8a274eebb136c6750eee1fafe74977b9bdeeb6238
MD5 43db585ab2eb1ad5468f108789b001b6
BLAKE2b-256 7cf9f5d6a8302a8b5d355092efe07eda753c24541be1b1c8694476541abad053

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.3.0-cp39-abi3-win_amd64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.3.0-cp39-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for exarch-0.3.0-cp39-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 c3ec0c5ac73d79651a0272dc4a77230bd05bdf9b67eb91c4a0ae08d9fe6b4594
MD5 f15963753031eaafbca0162317dbf727
BLAKE2b-256 9ad064457012d70f0f396bbcace18e9ba2ec6ec33ddceb14aed1cef45d30e134

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.3.0-cp39-abi3-musllinux_1_2_x86_64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.3.0-cp39-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for exarch-0.3.0-cp39-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 c2c5c1eea0361f2424b945eb90eb3266249300a91c31ce2588df908ea2330707
MD5 5cd80ec1530e1ffdf5d789457fc731a2
BLAKE2b-256 160bd528fdb6272d613e9b994985651281a745eb6ead02d71ae8f49b1a1cec56

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.3.0-cp39-abi3-musllinux_1_2_aarch64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.3.0-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for exarch-0.3.0-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 d881b926dde44e4b92efda6abf8d0d2b163dcd56b8ddea946e50f0b8493cc6fb
MD5 08f8c8ed5e3f5700701298d02b0d8963
BLAKE2b-256 f48059ec494331e6696155421da1d943c9e4d677b40687396c2f4e7ee67e08c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.3.0-cp39-abi3-manylinux_2_34_x86_64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.3.0-cp39-abi3-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for exarch-0.3.0-cp39-abi3-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 d098288611b1fa8fa32e3f2c90fa1330212514c68a1bca763f29596b4e80c045
MD5 690816c3f260aa548cf92a1ec04ebb2f
BLAKE2b-256 c29740db6fda181e6de8e18152622aad391b03c1ff88559fbb347c3e3e521c4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.3.0-cp39-abi3-manylinux_2_34_aarch64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.3.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for exarch-0.3.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 51099e981a407df4a774ebef721d8e01f4d640d5f03545fa409205439f040ce4
MD5 74fe7ea9e80ceb99b74e44d997a88266
BLAKE2b-256 dbc438108dc3f873faf14c227c40346135b56003199af8ddea1a1b5dedc549c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.3.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file exarch-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for exarch-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8606db5ae3bf290af1f007ad5dfc1dfe99536706b294b53e0e612c48f16ccd8c
MD5 14c842da31643b9e1241e23040aeb86e
BLAKE2b-256 1a2b81dc2e529ba29e74b74dac563e8d12a445a29d07cbc93a7b48f1ec566e0e

See more details on using hashes here.

Provenance

The following attestation bundles were made for exarch-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on bug-ops/exarch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page