Hardened TAR extraction for Python - secure by default.
Project description
Hardened TAR extraction for Python - secure by default.
safetar is a zero-dependency, production-grade wrapper around Python’s tarfile module that defends against the most common TAR-based attacks: TarSlip path traversal, decompression bombs, symlink/hardlink attacks, device file injection, and crafted archives.
Features
TarSlip protection - relative traversal, absolute paths, Unicode NFC normalisation attacks, PAX path overrides, GNU long-name reassembly, and null bytes in filenames are all blocked.
Decompression bomb protection - archive-level compression ratio monitoring across GZ, BZ2, and XZ streams aborts extraction before runaway decompression can exhaust disk or memory.
File size limits - per-member and total extraction size limits enforced at stream time (not based on untrusted header values).
Symlink policy - configurable: REJECT (default), IGNORE, or RESOLVE_INTERNAL (full chain verification with TOCTOU defence via deferred batch creation).
Hardlink policy - configurable: REJECT (default) or INTERNAL (target must exist on disk; forward references rejected).
Forbidden entry types - character devices, block devices, FIFOs, and unknown type codes are always rejected.
setuid/setgid/sticky bit stripping - dangerous permission bits are removed by default.
UID/GID ownership clamping - archived ownership is clamped to the current user by default.
Timestamp sanitisation - mtime values are clamped to [0, 2**32 - 1].
Sparse file policy - REJECT (default) or MATERIALISE (extract as dense).
Atomic writes - every member is written to a temporary file first; the destination is only created after all checks pass. No partial files are left on disk after a security abort.
Secure by default - all limits are active without any configuration.
Zero dependencies - standard library only.
Python 3.12 data_filter - applied as an additional defensive layer when available.
Prerequisites
Python 3.10 or later. No additional packages required.
Installation
With uv:
uv pip install safetar
Or with pip:
pip install safetar
Quick start
Drop-in replacement for the common tarfile extraction pattern:
from safetar import safe_extract
safe_extract("path/to/upload.tar.gz", "/var/files/extracted/")
Or use the SafeTarFile context manager for more control:
from safetar import SafeTarFile
with SafeTarFile("path/to/upload.tar.gz") as stf:
print(stf.getnames())
stf.extractall("/var/files/extracted/")
Custom limits
See the Default limits for reference.
from safetar import SafeTarFile, SymlinkPolicy, HardlinkPolicy
with SafeTarFile(
"path/to/upload.tar.gz",
max_file_size=100 * 1024 * 1024, # 100 MiB per member (default: 1 GiB)
max_total_size=500 * 1024 * 1024, # 500 MiB total (default: 5 GiB)
max_files=1_000, # (default: 10 000)
max_ratio=50.0, # (default: 200)
symlink_policy=SymlinkPolicy.IGNORE, # (default: SymlinkPolicy.REJECT)
hardlink_policy=HardlinkPolicy.INTERNAL, # (default: HardlinkPolicy.REJECT)
) as stf:
stf.extractall("/var/files/extracted/")
Recursive extraction
When an archive contains nested .tar files, set recursive=True to descend into them automatically. All safety limits apply at every level. Each nested archive is extracted into a directory named after it (without the extension). The nested .tar file is removed from disk after recursive extraction (see _extract_nested_archive in _core.py).
from safetar import SafeTarFile
# archive.tar
# readme.txt
# inner.tar ← will be descended into, not extracted as a blob
# inner_file.txt
with SafeTarFile("path/to/archive.tar.gz", recursive=True, max_nesting_depth=3) as stf:
stf.extractall("/var/files/extracted/")
# Result on disk:
# /var/files/extracted/readme.txt
# /var/files/extracted/inner/inner_file.txt
By default, recursive=False and nested tar archives are extracted as regular files. When recursive=True, safetar detects and extracts nested tar archives automatically using content-based detection (tarfile.is_tarfile()), avoiding extension-spoofing attacks.
All security protections are applied to nested archives:
Nesting depth is enforced (max_nesting_depth)
File size limits apply across all nested extractions (max_file_size, max_total_size)
Symlink, hardlink, and sparse policies are enforced
Permission, ownership, and timestamp sanitisation is applied
All other security checks (path traversal, decompression bombs, etc.)
Security event monitoring
from safetar import SafeTarFile, SecurityEvent
def my_monitor(event: SecurityEvent) -> None:
print(f"[safetar] {event.event_type} archive={event.archive_hash}")
with SafeTarFile(
"path/to/upload.tar.gz", on_security_event=my_monitor
) as stf:
stf.extractall("/var/files/extracted/")
Default limits
Parameter |
Default |
|---|---|
max_file_size |
1 GiB |
max_total_size |
5 GiB |
max_files |
10 000 |
max_ratio |
200 |
max_nesting_depth |
3 |
recursive |
False |
symlink_policy |
REJECT |
hardlink_policy |
REJECT |
sparse_policy |
REJECT |
strip_special_bits |
True |
preserve_ownership |
False |
clamp_timestamps |
True |
Environment variable configuration
See the Default limits for reference.
Every default can be overridden at process start via environment variables, without modifying call sites. Explicit constructor arguments always take precedence over environment variables.
Environment variable |
Parameter |
|---|---|
SAFETAR_MAX_FILE_SIZE |
max_file_size |
SAFETAR_MAX_TOTAL_SIZE |
max_total_size |
SAFETAR_MAX_FILES |
max_files |
SAFETAR_MAX_RATIO |
max_ratio |
SAFETAR_MAX_NESTING_DEPTH |
max_nesting_depth |
SAFETAR_RECURSIVE |
recursive |
SAFETAR_SYMLINK_POLICY |
symlink_policy |
SAFETAR_HARDLINK_POLICY |
hardlink_policy |
SAFETAR_SPARSE_POLICY |
sparse_policy |
SAFETAR_STRIP_SPECIAL_BITS |
strip_special_bits |
SAFETAR_PRESERVE_OWNERSHIP |
preserve_ownership |
SAFETAR_CLAMP_TIMESTAMPS |
clamp_timestamps |
Integer and float variables accept standard numeric strings. Boolean variables accept 1 / true / yes / on (truthy) or 0 / false / no / off (falsy), case-insensitively. Policy variables accept the lower-case enum value names (e.g. SAFETAR_SYMLINK_POLICY=resolve_internal). Unrecognised or unparseable values are silently ignored and the built-in default is used instead.
CLI
safetar ships with a CLI for quick extraction:
# Extract an archive
safetar extract path/to/archive.tar.gz /var/files/extracted/
# List archive contents
safetar list path/to/archive.tar.gz
# Extract with custom limits
safetar extract archive.tar /output/ \
--max-file-size 104857600 \
--max-total-size 524288000 \
--max-files 1000
# Enable recursive extraction
safetar extract archive.tar /output/ --recursive
# Show help
safetar --help
The CLI supports all the same security options as the Python API.
Testing
All tests run inside Docker to prevent accidental pollution of the host system:
make test
To test a specific Python version:
make test-env ENV=py312
Writing documentation
Keep the following hierarchy:
=====
title
=====
header
======
sub-header
----------
sub-sub-header
~~~~~~~~~~~~~~
sub-sub-sub-header
^^^^^^^^^^^^^^^^^^
sub-sub-sub-sub-header
++++++++++++++++++++++
sub-sub-sub-sub-sub-header
**************************
License
MIT
Support
For security issues contact me at the e-mail given in the Author section.
For overall issues, go to GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file safetar-0.1.2.tar.gz.
File metadata
- Download URL: safetar-0.1.2.tar.gz
- Upload date:
- Size: 36.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7c036866af3fe9ef70704210d1db877698df018d9477448de832f3c24432d1a
|
|
| MD5 |
35d91a1b7463e1bb649e28efc5ff84e9
|
|
| BLAKE2b-256 |
5f71b0d43ccf5eac6ef343fcaa7af6bcc484db969c28655fc1b8a342681dd382
|
File details
Details for the file safetar-0.1.2-py3-none-any.whl.
File metadata
- Download URL: safetar-0.1.2-py3-none-any.whl
- Upload date:
- Size: 38.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7f2a3a8969f6d822ef7890fb2ca8858e2ae7dc5daaa13005ad5c819efd321af
|
|
| MD5 |
c587b482a2cbc0a12a45b01e479f16c2
|
|
| BLAKE2b-256 |
b3ab2c9ccf266a573884a558d26fe09582e454508ae0997198a010425dd7560a
|