Skip to main content

An efficient, secure, and deterministic TAR streaming engine.

Project description

TarTape

TarTape is a streaming engine designed to turn massive folders into TAR archives with absolute predictability.

Standard archiving tools are dynamic: their internal structure changes depending on file names and sizes. This makes it impossible to know the exact layout of the archive until it is fully created.

TarTape changes the rules. It organizes your data into a "Master Tape" where every file entry follows a strict, fixed-size layout. This creates a predictable stream that solves the biggest challenges of large-scale data transfers:

  • Resume interrupted uploads: If a 10TB transfer fails at 4.2TB, TarTape knows the exact byte where it left off and can resume instantly without re-scanning the source.
  • Verify Integrity: It ensures the files you are streaming haven't changed since you first "recorded" the tape.
  • Navigate the Stream: Jump to any file or offset within the archive without having to process the preceding data.

Installation

pip install tartape

Usage Examples

1. Recording the Tape

Before streaming, you must "record" the folder. This creates a snapshot (stored in .tartape/index.db) that calculates the exact position of every file.

from tartape import TapeRecorder

# This creates the .tartape metadata folder inside your dataset
recorder = TapeRecorder("./massive_dataset")
fingerprint = recorder.commit()

print(f"Tape ready. Fingerprint: {fingerprint}")

2. Basic Streaming

Once recorded, you can stream the folder to any destination (Cloud, Disk, Network). TarTape ensures the stream is exactly as described in the metadata.

from tartape import Tape, TapePlayer

# Discover the tape and start the player
tape = Tape.discover("./massive_dataset")
player = TapePlayer(tape, directory="./massive_dataset")

with open("backup.tar", "wb") as f:
    # Every event contains data chunks or metadata
    for event in player.play():
        if event.type == "file_data":
            f.write(event.data)

3. Resuming an Interrupted Stream

Because TarTape uses fixed positions, you can resume a multi-terabyte upload if it fails. You only need to know the last byte successfully sent.

# If the previous upload failed at exactly 5GB...
start_offset = 5 * 1024 * 1024 * 1024

for event in player.play(start_offset=start_offset):
    # This skips all previous files and starts streaming
    # from the exact byte where the failure occurred.
    upload_to_cloud(event.data)

4. Professional Monitoring

TarTape acts as a "White Box," letting you see exactly which file is being processed and its calculated integrity.

for event in player.play():
    if event.type == "file_start":
        print(f"Archiving: {event.entry.arc_path} at offset {event.metadata.start_offset}")

    elif event.type == "file_end":
        # Each file reports its MD5 hash calculated on-the-fly
        print(f"Done: {event.entry.arc_path} | Hash: {event.metadata.md5sum}")

Observable Events

TarTape provides full visibility into the streaming process. Every chunk of data and every file transition is emitted as a structured event.

Event Type Description Key Metadata Available
file_start Emitted before a file or directory enters the stream. entry (metadata), start_offset, resumed (boolean).
file_data Raw bytes belonging to the current file (header, body, or padding). data (bytes).
file_end Emitted after a file is fully processed and closed. entry, end_offset, md5sum (if not resumed).
tape_completed Emitted after the 1024-byte TAR footer is sent. -

Constraints & Considerations

  • Path & Component Limits: To ensure a 100% predictable archive layout, TarTape enforces strict length limits:
    • Full path: Maximum of 255 bytes.
    • Individual component: Maximum of 100 bytes for any single file or directory name. This ensures every entry has its own independent integrity header (Type '5') without relying on variable-sized metadata blocks.
  • Anonymization: User/Group IDs and names are scrubbed by default. This ensures privacy and consistent fingerprints across different environments.
  • Standard Compatibility: Generated archives are fully compatible with modern TAR tools (tar, 7-zip, etc.).
  • Supported Types: Handles Files, Directories, and Symlinks. Sockets, Pipes, and Devices are ignored.
  • Strict Integrity: Any modification, addition, or deletion of files after the recording phase will invalidate the tape and abort the stream.
  • Data Portability: Designed for data movement and cloud streaming. Not intended for forensic OS backups where local ownership must be preserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tartape-2.0.0.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tartape-2.0.0-py3-none-any.whl (28.4 kB view details)

Uploaded Python 3

File details

Details for the file tartape-2.0.0.tar.gz.

File metadata

  • Download URL: tartape-2.0.0.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tartape-2.0.0.tar.gz
Algorithm Hash digest
SHA256 b58e088b9def19d5ebc622d5fb2a06e323b0ac64a30fb23f9ab22228bba82fce
MD5 b99d6595cea4c936b4e3f56692312e4e
BLAKE2b-256 60cbd8a3334fd0a736fe97977271b810ec1c672e617783279c4da8049c2155f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for tartape-2.0.0.tar.gz:

Publisher: publish.yml on CalumRakk/tartape

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tartape-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: tartape-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 28.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tartape-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b13c092e76c1042e7bef9b2f46824cc84280906a21d85ff4074b426df00d975e
MD5 49a17d7f991a4b053ed60edc306263ba
BLAKE2b-256 63ce909acf8e42e900a1aba8489c928334813492b2cae59ee2068fa2af004bfc

See more details on using hashes here.

Provenance

The following attestation bundles were made for tartape-2.0.0-py3-none-any.whl:

Publisher: publish.yml on CalumRakk/tartape

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page