Skip to main content

Add your description here

Project description

itar

image

itar builds constant‑time indexes over one or more tar file shards, enabling direct, random access to members without extracting the archives. It ships a lightweight CLI (itar) and a Python API.

Designed for large datasets and deep‑learning pipelines, it supports single or sharded tar archives with thread‑safe access for concurrent reads.

Quickstart

pip install itar[cli]

Single tarball

echo "Hello world!" > hello.txt
tar cf hello.tar hello.txt       # regular tarball

itar index create hello.itar     # indexes hello.tar
itar index list hello.itar       # list indexed members
import itar

with itar.open("hello.itar") as archive:
    print(archive["hello.txt"].read())

Sharded tarballs

Give each shard a zero-padded suffix before building the index:

tar cf photos-0.tar wedding/    # shard 0
tar cf photos-1.tar vacation/   # shard 1

itar index create photos.itar   # discovers photos-0.tar, photos-1.tar, ...
itar index list -l photos.itar  # shard index, offsets, byte sizes
import itar

with itar.open("photos.itar") as photos:
    assert "wedding/cake.jpg" in photos
    img_bytes = photos["vacation/sunrise.jpg"].read()

CLI reference

Command Purpose
itar index create <archive>.itar [--single TAR | --shards shard0.tar shard1.tar ...] Indexes a single archive or an explicit set of shards. With no flags, shards are auto-discovered next to <archive>.itar.
itar index list <archive>.itar Lists members. Use -l for shard/offset info and -H for human-readable sizes.
itar index check <archive>.itar Validates recorded entries; add --member NAME to focus on specific files.
itar cat <archive>.itar <member> Streams a member’s bytes to stdout.

Python helpers

  • itar.index.build(shards, progress_bar=False) -> dict: construct an index mapping for paths, file objects, or buffers.
  • itar.index.create("archive.itar", shards): convenience wrapper that builds + saves an index file.
  • itar.index.dump(index, path): serialize an index you built elsewhere.
  • itar.index.load(path) -> dict: load the msgpack index without opening shards.
  • itar.open(path, *, shards=None, open_fn=None) -> IndexedTarFile: attach shard handles using an existing index file.

itar File Format

An itar index file is a simple MessagePack dictionary mapping member paths to metadata:

{
    "path/to/member1.jpg": [  # file name
        null,                 # either null or shard index (0-based)
        [
            2048,             # metadata byte offset
            2560,             # data byte offset
            1048576,          # file length in bytes
        ],
    ],
    ...
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itar-0.4.1.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

itar-0.4.1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file itar-0.4.1.tar.gz.

File metadata

  • Download URL: itar-0.4.1.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for itar-0.4.1.tar.gz
Algorithm Hash digest
SHA256 e6d138be09b6f3b8de74104122568c878db64d306d1a70b1e344ebe50ff9537d
MD5 6a6ceb3c101e5fc6e61ec2a3b098f150
BLAKE2b-256 eb1f1a9106f856654bf0fc52cfbe2379cd999f6ad0b9fcba3c9a4688359eeaeb

See more details on using hashes here.

File details

Details for the file itar-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: itar-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for itar-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a434ae96ab26f344ecc9904bb0977daa019af3215d14b048f4448535248c19da
MD5 01bf3c3f5eb7ef4ac984caa53fdc0602
BLAKE2b-256 f3e32fe3671139ed7392b758185ac1f6472d6a9d90d2a9c5e9cf5f307a19fbb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page