Skip to main content

Universal archive extractor supporting 30+ formats including ZIP, RAR, 7Z, TAR, and many more

Project description

UnzipAll - Universal Archive Extractor

PyPI version Python versions License: MIT Tests Coverage

A comprehensive Python library for extracting archive files in 30+ formats with a simple, unified API. No more juggling multiple extraction libraries or dealing with format-specific quirks.

โœจ Features

  • ๐Ÿ—ƒ๏ธ Universal Format Support: ZIP, RAR, 7Z, TAR (all variants), ISO, MSI, and 25+ more formats
  • ๐Ÿ›ก๏ธ Security First: Built-in protection against path traversal attacks (zip bombs)
  • ๐Ÿ” Password Support: Handle encrypted archives seamlessly
  • โšก Simple API: One function call to extract any supported archive
  • ๐Ÿ”ง CLI Tool: Extract archives from command line with unzipall command
  • ๐ŸŒ Cross-Platform: Works on Windows, macOS, and Linux
  • ๐Ÿ—๏ธ Type Safe: Full type hints for better IDE support and development experience
  • ๐Ÿ“Š Graceful Degradation: Optional dependencies - missing libraries don't break functionality

๐Ÿš€ Quick Start

Installation

pip install unzipall

Basic Usage

import unzipall

# Extract any archive format - it just works!
unzipall.extract('archive.zip')
unzipall.extract('data.tar.gz', 'output_folder')
unzipall.extract('encrypted.7z', password='secret')

# Check if format is supported
if unzipall.is_supported('mystery_file.xyz'):
    unzipall.extract('mystery_file.xyz')

# List all supported formats  
formats = unzipall.list_supported_formats()
print(f"Supports {len(formats)} formats!")

Command Line Usage

# Extract to current directory
unzipall archive.zip

# Extract to specific directory
unzipall archive.tar.gz /path/to/output

# Extract password-protected archive
unzipall -p mypassword encrypted.7z

# List supported formats
unzipall --list-formats

# Verbose output
unzipall -v archive.rar output_dir

๐Ÿ“ Supported Formats

Category Formats Status
ZIP Family .zip, .jar, .war, .ear, .apk, .epub, .cbz โœ… Built-in
RAR Family .rar, .cbr โœ… Full Support
7-Zip .7z, .cb7 โœ… Full Support
TAR Archives .tar, .tar.gz, .tgz, .tar.bz2, .tbz2, .tar.xz, .txz, .tar.z, .tar.lzma โœ… Built-in
Compression .gz, .bz2, .xz, .lzma, .z โœ… Built-in
Other Archives .arj, .cab, .chm, .cpio, .deb, .rpm, .lzh, .lha โœ… Via patool
Disk Images .iso, .vhd, .udf โœ… Full Support
Microsoft .msi, .exe (self-extracting), .wim โœ… Platform-aware
Specialized .xar, .zpaq, .cso, .pkg, .cbt โœ… Via patool

30+ formats supported! If a format isn't working, it may require additional system tools (see System Dependencies).

๐Ÿ›  Advanced Usage

Programmatic API

from unzipall import ArchiveExtractor, ArchiveExtractionError

# Create extractor with custom settings
extractor = ArchiveExtractor(verbose=True)

# Check available features
features = extractor.get_available_features()
for feature, available in features.items():
    status = "โœ…" if available else "โŒ"
    print(f"{status} {feature}")

# Extract with error handling
try:
    success = extractor.extract(
        archive_path='large_archive.rar',
        extract_to='output_directory',
        password='optional_password'
    )
    if success:
        print("Extraction completed successfully!")
        
except ArchiveExtractionError as e:
    print(f"Extraction failed: {e}")

Error Handling

UnzipAll provides specific exceptions for different failure scenarios:

from unzipall import (
    ArchiveExtractionError, UnsupportedFormatError, 
    CorruptedArchiveError, PasswordRequiredError,
    InvalidPasswordError, ExtractionPermissionError, 
    DiskSpaceError
)

try:
    unzipall.extract('archive.zip')
except UnsupportedFormatError:
    print("This archive format is not supported")
except PasswordRequiredError:
    password = input("Enter password: ")
    unzipall.extract('archive.zip', password=password)
except CorruptedArchiveError:
    print("Archive file is corrupted")
except DiskSpaceError:
    print("Not enough disk space")
except ArchiveExtractionError as e:
    print(f"Extraction failed: {e}")

Security Features

UnzipAll automatically protects against common archive-based attacks:

# Path traversal protection (zip bombs)
# Malicious archives with paths like "../../etc/passwd" are safely handled
unzipall.extract('potentially_malicious.zip', 'safe_output_dir')

# Files are extracted only within the target directory
# Dangerous paths are logged and skipped

๐Ÿ”ง System Dependencies

While UnzipAll works out of the box for common formats (ZIP, TAR, GZIP, etc.), some formats require additional system tools:

Windows

# Install via Windows Package Manager
winget install 7zip.7zip
winget install RARLab.WinRAR

# Or install via Chocolatey
choco install 7zip winrar

macOS

# Using Homebrew
brew install p7zip unrar

# For additional formats
brew install cabextract unshield

Linux (Ubuntu/Debian)

sudo apt update
sudo apt install p7zip-full unrar-free

# For additional formats
sudo apt install cabextract unshield cpio

Linux (RHEL/CentOS/Fedora)

sudo dnf install p7zip p7zip-plugins unrar

# For additional formats  
sudo dnf install cabextract unshield cpio

Docker Usage

FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    p7zip-full \
    unrar-free \
    cabextract \
    && rm -rf /var/lib/apt/lists/*

# Install unzipall
RUN pip install unzipall

# Your application code
COPY . /app
WORKDIR /app

๐Ÿ“Š Performance

UnzipAll is designed for reliability and format support over raw speed. Benchmarks on typical archives:

  • ZIP files: ~80 extractions/second
  • TAR.GZ files: ~60 extractions/second
  • 7Z files: ~40 extractions/second
  • RAR files: ~35 extractions/second

Performance varies based on archive size, compression ratio, and available system resources.

๐Ÿงช Development & Testing

# Clone the repository
git clone https://github.com/mricardo/unzipall.git
cd unzipall

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=src/unzipall --cov-report=html

# Format code
black src/ tests/

# Type checking
mypy src/

# Lint code
flake8 src/ tests/

Running Specific Tests

# Test basic functionality
pytest tests/test_smoke.py -v

# Test specific archive format
pytest tests/test_core.py::test_extract_valid_zip -v

# Performance benchmarks
pytest tests/test_performance.py --benchmark-only

# Skip slow tests
pytest -m "not slow"

๐Ÿ”— API Reference

Main Functions

extract(archive_path, extract_to=None, password=None, verbose=False)

Extract an archive to the specified directory.

Parameters:

  • archive_path (str|Path): Path to the archive file
  • extract_to (str|Path, optional): Output directory (defaults to archive stem name)
  • password (str, optional): Password for encrypted archives
  • verbose (bool): Enable detailed logging

Returns: bool - True if successful

Example:

# Extract to default location (archive stem name)
unzipall.extract('myfiles.zip')  # Creates ./myfiles/

# Extract to specific directory
unzipall.extract('myfiles.zip', 'custom_output')

# Extract encrypted archive
unzipall.extract('secret.7z', password='mypassword')

is_supported(file_path)

Check if a file format is supported.

Parameters:

  • file_path (str|Path): Path to file to check

Returns: bool - True if format is supported

Example:

if unzipall.is_supported('data.xyz'):
    print("This format is supported!")
else:
    print("Unsupported format")

list_supported_formats()

Get list of all supported file extensions.

Returns: List[str] - Sorted list of supported extensions

Example:

formats = unzipall.list_supported_formats()
print(f"Supported: {', '.join(formats)}")

ArchiveExtractor Class

For advanced usage with custom configuration:

from unzipall import ArchiveExtractor

extractor = ArchiveExtractor(verbose=True)

# Check what features are available
features = extractor.get_available_features()

# Extract with custom settings
success = extractor.extract('archive.zip', 'output_dir')

Exception Hierarchy

ArchiveExtractionError (base)
โ”œโ”€โ”€ UnsupportedFormatError
โ”œโ”€โ”€ CorruptedArchiveError  
โ”œโ”€โ”€ PasswordRequiredError
โ”œโ”€โ”€ InvalidPasswordError
โ”œโ”€โ”€ ExtractionPermissionError
โ””โ”€โ”€ DiskSpaceError

๐Ÿค Contributing

Contributions are welcome! Here's how to get started:

  1. Fork the repository on GitHub
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Install development dependencies: pip install -e ".[dev]"
  4. Make your changes and add tests
  5. Run the test suite: pytest
  6. Commit your changes: git commit -m "Add amazing feature"
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

Development Guidelines

  • Write tests for new features
  • Follow PEP 8 style guidelines (use black for formatting)
  • Add type hints for new functions
  • Update documentation for API changes
  • Ensure all tests pass before submitting

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built on top of excellent libraries: py7zr, rarfile, patool, libarchive-c, and others
  • Inspired by the need for a simple, unified archive extraction interface
  • Thanks to all contributors and users who help improve this library

๐Ÿ”— Related Projects

  • patool - Command-line archive tool
  • py7zr - Pure Python 7-zip library
  • rarfile - RAR archive reader
  • zipfile - Python standard library ZIP support

๐Ÿ“ž Support


Star this repo if you find it useful! โญ

Made with โค๏ธ by mricardo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unzipall-1.0.1.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unzipall-1.0.1-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file unzipall-1.0.1.tar.gz.

File metadata

  • Download URL: unzipall-1.0.1.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for unzipall-1.0.1.tar.gz
Algorithm Hash digest
SHA256 360afbaa59f956bdc071d193f851eb111e1fcaf636eeb305386e6d5d983c5daa
MD5 37892dbc75815449104bc1a121d1817e
BLAKE2b-256 f1e774d7f663cfa2548f5642b662c5fa7aa97aee94c786ea4c37a4e37a2f3de6

See more details on using hashes here.

File details

Details for the file unzipall-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: unzipall-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for unzipall-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5dc7c645bbdda5d05f4af76f6f8031bb4c173835bf7312397cfbc2a5734f09e8
MD5 902f1bea58548079e099082d09a9f11e
BLAKE2b-256 b5cd64da418a024d488b1f960f23439b6b4a2acc1053e7e5e59a45a54dcddca4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page