Skip to main content

Python interface for parallel-tar (.etr / .idx) index files

Project description

ptar-index

Python interface for parallel-tar index files (.etr and .idx).

Load binary index files generated by parallel-idx and work with them using native Python data types — dataclasses, iterators, dicts, and optional pandas DataFrames.

Installation

uv sync

With pandas support:

uv sync --extra pandas

Quick start

from ptar_index import load_index

idx = load_index("example.idx")
print(idx)
# <PtarIndex [IDX] '/global/projects/data'  4821995 files, 417285 dirs, 1.65 TB>

# Browse the tree
idx.root.print_tree(max_depth=2)

# Navigate like a dict
lcls = idx.root["LCLS"]
psdm = idx.resolve("LCLS/sit_psdm_data/psdm")

# Iterate all files
for f in idx.walk_files():
    print(f.path, f.human_size, f.hash_hex)

# Glob search
for f in idx.glob("**/*.tar"):
    print(f.path, f.size)

# Export to pandas DataFrame
df = idx.to_dataframe()

# Compare two indexes
old = load_index("before.idx")
new = load_index("after.idx")
diff = old.diff(new)
print(diff.summary())  # "42 added, 3 removed, 17 changed"

CLI

The package installs a ptar-index command:

# Summary + tree view
ptar-index example.idx

# Inspect raw msgpack structure (for debugging)
ptar-index example.idx --raw

# List all file entries
ptar-index example.idx --files

# Filter by glob pattern
ptar-index example.idx --glob "**/*.tar"

# Full JSON export
ptar-index example.idx --json

Debugging format issues

If the auto-detection doesn't map fields correctly, dump the raw MessagePack structure first:

from ptar_index import describe_raw
print(describe_raw("example.idx", max_depth=3))

This shows the exact field names and types as stored in the binary file, making it straightforward to adjust the parser.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ptar_index-1.0.0.tar.gz (56.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ptar_index-1.0.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file ptar_index-1.0.0.tar.gz.

File metadata

  • Download URL: ptar_index-1.0.0.tar.gz
  • Upload date:
  • Size: 56.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ptar_index-1.0.0.tar.gz
Algorithm Hash digest
SHA256 543bb7259b7ebfc074abb6c7d3647475a5fb29573e43383031d1691d60be80d0
MD5 39ae9ab978e04e9edd2029bb64e772e6
BLAKE2b-256 273583826fa2a9fadb33b3c820e4cd900293c6857b37ea6f97e222cee67a73ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for ptar_index-1.0.0.tar.gz:

Publisher: python-publish.yml on JBlaschke/parallel-tar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ptar_index-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ptar_index-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ptar_index-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f4aea2fa603c9a8fcdf177c0c2b43543ccd4b4236de742d9436c81b7f997713d
MD5 3e92dfb87219ab260e5b3ae47ffac3f0
BLAKE2b-256 b003d821c8600dcc2e0f997c46831e9e3245867a0fbf8b2e433acc8126f0595b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ptar_index-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on JBlaschke/parallel-tar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page