Skip to main content

Hardened TAR extraction for Python - secure by default.

Project description

SafeTar Logo

Hardened TAR extraction for Python - secure by default.

PyPI Version Supported Python versions Build Status Documentation Status llms.txt - documentation for LLMs MIT Coverage

safetar is a zero-dependency, production-grade wrapper around Python’s tarfile module that defends against the most common TAR-based attacks: TarSlip path traversal, decompression bombs, symlink/hardlink attacks, device file injection, and crafted archives.

Features

  • TarSlip protection - relative traversal, absolute paths, Unicode NFC normalisation attacks, PAX path overrides, GNU long-name reassembly, and null bytes in filenames are all blocked.

  • Decompression bomb protection - archive-level compression ratio monitoring across GZ, BZ2, and XZ streams aborts extraction before runaway decompression can exhaust disk or memory.

  • File size limits - per-member and total extraction size limits enforced at stream time (not based on untrusted header values).

  • Symlink policy - configurable: REJECT (default), IGNORE, or RESOLVE_INTERNAL (full chain verification with TOCTOU defence via deferred batch creation).

  • Hardlink policy - configurable: REJECT (default) or INTERNAL (target must exist on disk; forward references rejected).

  • Forbidden entry types - character devices, block devices, FIFOs, and unknown type codes are always rejected.

  • setuid/setgid/sticky bit stripping - dangerous permission bits are removed by default.

  • UID/GID ownership clamping - archived ownership is clamped to the current user by default.

  • Timestamp sanitisation - mtime values are clamped to [0, 2**32 - 1].

  • Sparse file policy - REJECT (default) or MATERIALISE (extract as dense).

  • Atomic writes - every member is written to a temporary file first; the destination is only created after all checks pass. No partial files are left on disk after a security abort.

  • Secure by default - all limits are active without any configuration.

  • Zero dependencies - standard library only.

  • Python 3.12 data_filter - applied as an additional defensive layer when available.

Prerequisites

Python 3.10 or later. No additional packages required.

Installation

With uv:

uv pip install safetar

Or with pip:

pip install safetar

Quick start

Drop-in replacement for the common tarfile extraction pattern:

from safetar import safe_extract

safe_extract("path/to/upload.tar.gz", "/var/files/extracted/")

Or use the SafeTarFile context manager for more control:

from safetar import SafeTarFile

with SafeTarFile("path/to/upload.tar.gz") as stf:
    print(stf.getnames())
    stf.extractall("/var/files/extracted/")

Custom limits

See the Default limits for reference.

from safetar import SafeTarFile, SymlinkPolicy, HardlinkPolicy

with SafeTarFile(
    "path/to/upload.tar.gz",
    max_file_size=100 * 1024 * 1024,          # 100 MiB per member (default: 1 GiB)
    max_total_size=500 * 1024 * 1024,         # 500 MiB total (default: 5 GiB)
    max_files=1_000,                          # (default: 10 000)
    max_ratio=50.0,                           # (default: 200)
    symlink_policy=SymlinkPolicy.IGNORE,      # (default: SymlinkPolicy.REJECT)
    hardlink_policy=HardlinkPolicy.INTERNAL,  # (default: HardlinkPolicy.REJECT)
) as stf:
    stf.extractall("/var/files/extracted/")

Recursive extraction

When an archive contains nested .tar files, set recursive=True to descend into them automatically. All safety limits apply at every level. Each nested archive is extracted into a directory named after it (without the extension). The nested .tar file is removed from disk after recursive extraction (see _extract_nested_archive in _core.py).

from safetar import SafeTarFile

# archive.tar
#   readme.txt
#   inner.tar          ← will be descended into, not extracted as a blob
#     inner_file.txt

with SafeTarFile("path/to/archive.tar.gz", recursive=True, max_nesting_depth=3) as stf:
    stf.extractall("/var/files/extracted/")

# Result on disk:
#   /var/files/extracted/readme.txt
#   /var/files/extracted/inner/inner_file.txt

By default, recursive=False and nested tar archives are extracted as regular files. When recursive=True, safetar detects and extracts nested tar archives automatically using content-based detection (tarfile.is_tarfile()), avoiding extension-spoofing attacks.

All security protections are applied to nested archives:

  • Nesting depth is enforced (max_nesting_depth)

  • File size limits apply across all nested extractions (max_file_size, max_total_size)

  • Symlink, hardlink, and sparse policies are enforced

  • Permission, ownership, and timestamp sanitisation is applied

  • All other security checks (path traversal, decompression bombs, etc.)

Security event monitoring

from safetar import SafeTarFile, SecurityEvent

def my_monitor(event: SecurityEvent) -> None:
    print(f"[safetar] {event.event_type} archive={event.archive_hash}")

with SafeTarFile(
    "path/to/upload.tar.gz", on_security_event=my_monitor
) as stf:
    stf.extractall("/var/files/extracted/")

Default limits

Parameter

Default

max_file_size

1 GiB

max_total_size

5 GiB

max_files

10 000

max_ratio

200

max_nesting_depth

3

recursive

False

symlink_policy

REJECT

hardlink_policy

REJECT

sparse_policy

REJECT

strip_special_bits

True

preserve_ownership

False

clamp_timestamps

True

Environment variable configuration

See the Default limits for reference.

Every default can be overridden at process start via environment variables, without modifying call sites. Explicit constructor arguments always take precedence over environment variables.

Environment variable

Parameter

SAFETAR_MAX_FILE_SIZE

max_file_size

SAFETAR_MAX_TOTAL_SIZE

max_total_size

SAFETAR_MAX_FILES

max_files

SAFETAR_MAX_RATIO

max_ratio

SAFETAR_MAX_NESTING_DEPTH

max_nesting_depth

SAFETAR_RECURSIVE

recursive

SAFETAR_SYMLINK_POLICY

symlink_policy

SAFETAR_HARDLINK_POLICY

hardlink_policy

SAFETAR_SPARSE_POLICY

sparse_policy

SAFETAR_STRIP_SPECIAL_BITS

strip_special_bits

SAFETAR_PRESERVE_OWNERSHIP

preserve_ownership

SAFETAR_CLAMP_TIMESTAMPS

clamp_timestamps

Integer and float variables accept standard numeric strings. Boolean variables accept 1 / true / yes / on (truthy) or 0 / false / no / off (falsy), case-insensitively. Policy variables accept the lower-case enum value names (e.g. SAFETAR_SYMLINK_POLICY=resolve_internal). Unrecognised or unparseable values are silently ignored and the built-in default is used instead.

CLI

safetar ships with a CLI for quick extraction:

# Extract an archive
safetar extract path/to/archive.tar.gz /var/files/extracted/

# List archive contents
safetar list path/to/archive.tar.gz

# Extract with custom limits
safetar extract archive.tar /output/ \
    --max-file-size 104857600 \
    --max-total-size 524288000 \
    --max-files 1000

# Enable recursive extraction
safetar extract archive.tar /output/ --recursive

# Show help
safetar --help

The CLI supports all the same security options as the Python API.

Testing

All tests run inside Docker to prevent accidental pollution of the host system:

make test

To test a specific Python version:

make test-env ENV=py312

Writing documentation

Keep the following hierarchy:

=====
title
=====

header
======

sub-header
----------

sub-sub-header
~~~~~~~~~~~~~~

sub-sub-sub-header
^^^^^^^^^^^^^^^^^^

sub-sub-sub-sub-header
++++++++++++++++++++++

sub-sub-sub-sub-sub-header
**************************

License

MIT

Support

For security issues contact me at the e-mail given in the Author section.

For overall issues, go to GitHub.

Author

Artur Barseghyan <artur.barseghyan@gmail.com>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safetar-0.1.2.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

safetar-0.1.2-py3-none-any.whl (38.5 kB view details)

Uploaded Python 3

File details

Details for the file safetar-0.1.2.tar.gz.

File metadata

  • Download URL: safetar-0.1.2.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for safetar-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a7c036866af3fe9ef70704210d1db877698df018d9477448de832f3c24432d1a
MD5 35d91a1b7463e1bb649e28efc5ff84e9
BLAKE2b-256 5f71b0d43ccf5eac6ef343fcaa7af6bcc484db969c28655fc1b8a342681dd382

See more details on using hashes here.

File details

Details for the file safetar-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: safetar-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 38.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for safetar-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f7f2a3a8969f6d822ef7890fb2ca8858e2ae7dc5daaa13005ad5c819efd321af
MD5 c587b482a2cbc0a12a45b01e479f16c2
BLAKE2b-256 b3ab2c9ccf266a573884a558d26fe09582e454508ae0997198a010425dd7560a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page