Add your description here
Project description
itar
itar builds constant‑time indexes over one or more tar file shards, enabling direct, random access to members without extracting the archives. It ships a lightweight CLI (itar) and a Python API.
Designed for large datasets and deep‑learning pipelines, it supports single or sharded tar archives with thread‑safe access for concurrent reads.
Quickstart
pip install itar[cli]
Single tarball
echo "Hello world!" > hello.txt
tar cf hello.tar hello.txt # regular tarball
itar index create hello.itar # indexes hello.tar
itar index list hello.itar # list indexed members
import itar
with itar.open("hello.itar") as archive:
print(archive["hello.txt"].read())
Sharded tarballs
Give each shard a zero-padded suffix before building the index:
tar cf photos-0.tar wedding/ # shard 0
tar cf photos-1.tar vacation/ # shard 1
itar index create photos.itar # discovers photos-0.tar, photos-1.tar, ...
itar index list -l photos.itar # shard index, offsets, byte sizes
import itar
with itar.open("photos.itar") as photos:
assert "wedding/cake.jpg" in photos
img_bytes = photos["vacation/sunrise.jpg"].read()
CLI reference
| Command | Purpose |
|---|---|
itar index create <archive>.itar [--single TAR | --shards shard0.tar shard1.tar ...] |
Indexes a single archive or an explicit set of shards. With no flags, shards are auto-discovered next to <archive>.itar. |
itar index list <archive>.itar |
Lists members. Use -l for shard/offset info and -H for human-readable sizes. |
itar index check <archive>.itar |
Validates recorded entries; add --member NAME to focus on specific files. |
itar cat <archive>.itar <member> |
Streams a member’s bytes to stdout. |
Python helpers
itar.index.build(shards, progress_bar=False) -> dict: construct an index mapping for paths, file objects, or buffers.itar.index.create("archive.itar", shards): convenience wrapper that builds + saves an index file.itar.index.dump(index, path): serialize an index you built elsewhere.itar.index.load(path) -> dict: load the msgpack index without opening shards.itar.open(path, *, shards=None, open_fn=None) -> IndexedTarFile: attach shard handles using an existing index file.
itar File Format
An itar index file is a simple MessagePack dictionary mapping member paths to metadata:
{
"path/to/member1.jpg": [ # file name
null, # either null or shard index (0-based)
[
2048, # metadata byte offset
2560, # data byte offset
1048576, # file length in bytes
],
],
...
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file itar-0.4.1.tar.gz.
File metadata
- Download URL: itar-0.4.1.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6d138be09b6f3b8de74104122568c878db64d306d1a70b1e344ebe50ff9537d
|
|
| MD5 |
6a6ceb3c101e5fc6e61ec2a3b098f150
|
|
| BLAKE2b-256 |
eb1f1a9106f856654bf0fc52cfbe2379cd999f6ad0b9fcba3c9a4688359eeaeb
|
File details
Details for the file itar-0.4.1-py3-none-any.whl.
File metadata
- Download URL: itar-0.4.1-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.14 {"installer":{"name":"uv","version":"0.9.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a434ae96ab26f344ecc9904bb0977daa019af3215d14b048f4448535248c19da
|
|
| MD5 |
01bf3c3f5eb7ef4ac984caa53fdc0602
|
|
| BLAKE2b-256 |
f3e32fe3671139ed7392b758185ac1f6472d6a9d90d2a9c5e9cf5f307a19fbb1
|