Skip to main content

Python PANOSETI File Format (PFF) I/O Library

Project description

pypff - Python PANOSETI File Format I/O Library

pypff-CI Version Python Coverage License

A high-performance Python package for reading and analyzing data files generated by PANOSETI (PANOSETI File Format - PFF).

Features

  • Streaming by default: iter_batches(size=256) and __iter__ yield zero-copy views without materializing the full sequence into RAM — safe for Jupyter and HPC pipelines alike.
  • Distributed chunked reads: iter_byte_range(file_idx, byte_start, byte_end) lets Dask/Nextflow workers parse frame-aligned byte ranges in parallel without coordination.
  • Zero-copy random access: seq[i] returns a strided mmap view; slicing and read_images(indices) use a sort + inverse-permutation for disk locality.
  • Single-pass metadata: get_metadata_arrays(keys) extracts any number of header fields in one np.frombuffer pass per file via a composite NumPy structured dtype. Supports virtual unix_t_ns key.
  • Nanosecond-precise timestamps: timestamps() returns int64 ns (no float precision loss); timestamps(as_datetime=True) returns a zero-copy datetime64[ns] view for matplotlib and pandas.
  • PFF → Zarr v3 conversion: pypff[zarr] optional extra converts any .pffd run to Zarr v3 stores readable by xarray, dask, numpy, TensorStore, and Julia — lossless, compressed, HPC/ML-ready. See Zarr v3 spec.
  • Bounded resources: LRU mmap handle cache (default 16 files); PFFSequence is a context manager.
  • Multiprocessing-safe: pickle-compatible — file handles are dropped on serialisation and lazily reopened in workers.
  • Pydantic validation: strict schema validation for all PFF headers and PANOSETI config files.
  • Run discovery: PanosetiRun lazily scans a .pffd directory and exposes typed configs, housekeeping, and data products.

Installation

The package uses uv for dependency management.

# User install: pip
pip install pypff

# User install: add to existing uv environment:
uv add pypff

# Development Install options
cd pypff
uv sync                   # core library
uv sync --extra zarr      # + Zarr v3 conversion (zarr-python, xarray, dask)

Quick Start

Run discovery

from pypff import PanosetiRun

run = PanosetiRun("path/to/run_directory.pffd")
run.show()                          # pretty-print structure

print(run.list_products())          # ['dp_img16.bpp_2.module_1', ...]
seq = run.get_product("dp_img16.bpp_2.module_1")

obs_cfg  = run.get_config("obs_config")   # returns Pydantic model
data_cfg = run.get_config("data_config")
hk       = run.get_hk()                   # dict[device, dict[field, np.ndarray]]

Streaming (memory-bounded)

# Iterate frame-by-frame (zero-copy views into mmap)
for img in seq:
    process(img)

# Batch iteration — never holds more than one batch in RAM
for batch in seq.iter_batches(size=256):
    batch  # shape (256, H, W), dtype matches file

# With timestamps and headers in one pass
for batch, ts in seq.iter_batches(size=256, with_timestamps=True):
    ts  # int64 nanoseconds, shape (256,)

# Distributed byte-range reads (Dask / Nextflow workers)
for batch in seq.iter_byte_range(file_idx=0, byte_start=0, byte_end=size):
    ...

Random access and slicing

img   = seq[42]            # zero-copy view, shape (H, W)
imgs  = seq[0:100:2]       # every other frame, shape (50, H, W)
imgs  = seq.read_images(np.array([5, 0, 3]))   # unsorted — sorted internally for locality
imgs  = seq.read_images_range(start=0, count=500)

Timestamps — always int64 nanoseconds

# Full cached array — int64 ns, no float precision loss
ts = seq.timestamps()                      # np.ndarray[int64]

# Zero-copy datetime64[ns] view for matplotlib / pandas
ts_dt = seq.timestamps(as_datetime=True)   # np.ndarray[datetime64[ns]]

# Indexed subset
ts_sub = seq.timestamps(indices=np.arange(0, len(seq), 100))

# Single frame
t_ns = seq.timestamp_at(42)               # int, nanoseconds

# Time-based navigation
idx = seq.seek_time(t_ns + 1_000_000_000)  # 1 s later

# Arithmetic rule: subtract epoch in integer space before dividing
rel_s = (ts - ts[0]) / 1e9               # CORRECT — small values, no precision loss
# ts / 1e9 - t0_s                        # WRONG  — loses ns precision at ~1.7e9 s

Metadata extraction (single pass)

# One np.frombuffer call per file regardless of number of keys
meta = seq.get_metadata_arrays(["pkt_num", "tv_sec", "unix_t_ns"])
meta["unix_t_ns"]   # int64 ns, same as timestamps()
meta["pkt_num"]     # int64

# All fields at once
all_meta = seq.get_all_metadata()

Headers and Pydantic validation

# Fast path — raw dict, no Pydantic overhead
header, img = seq.get_frame(0)
header["pkt_num"]                 # quabo file
header["quabo_0"]["pkt_tai"]      # module file

# Validated path — full Pydantic model
from pypff.models import ModuleHeader
hdr, img = seq.get_frame_validated(0)
assert isinstance(hdr, ModuleHeader)

Context manager and multiprocessing

# Deterministic handle cleanup
with run.get_product("dp_ph256.bpp_2.module_254") as seq:
    data = seq.read_images_range(0, 1000)

# PFFSequence is pickle-safe — handles are dropped on serialisation
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as ex:
    futures = [ex.submit(lambda s, i: s[i].sum(), seq, i) for i in range(len(seq))]

PFF → Zarr v3 conversion (pypff[zarr])

Convert a .pffd observation run to Zarr v3 stores readable by xarray, dask, numpy, TensorStore, Julia, and Rust. See docs/zarr_v3_spec.md for the full layout specification.

Write

from pypff.io2 import PanosetiRun
from pypff.zarr import convert_run

run = PanosetiRun("path/to/obs.pffd")
stores = convert_run(run, "output/L0_zarr")
# → one .zarr per (data_product, module)
# → one .panoseti-meta/ sidecar bundle (configs, logs, hk.pff)

Or via the CLI:

uv run pypff zarr path/to/obs.pffd output/L0_zarr

Read

from pypff.zarr import PanosetiZarrRun
import xarray as xr

# High-level wrapper (mirrors PanosetiRun)
zrun = PanosetiZarrRun("output/L0_zarr")
store = zrun.get_product("dp_ph256.bpp_2.module_254")
ts    = store.timestamps()           # int64 ns array
ds    = store.to_dataset()           # xarray.Dataset
cfgs  = zrun.configs                 # parsed run configs

# Or open directly with xarray — all arrays visible as variables
ds = xr.open_zarr(str(stores[0]), consolidated=False)
# ds.images      (T, H, W)   int16 / uint16
# ds.unix_t_ns   (T,)        int64  — nanosecond timestamps
# ds.pkt_num     (T,)        uint32 ─┐ per-frame header fields
# ds.pkt_nsec    (T,)        uint32  │ (single-level: ph256)
# ds.quabo_num   (T,)        uint8  ─┘
# ds.quabo_0_pkt_num  (T,)   uint32 ─┐ module-level headers
# …                                  ─┘ (img16, ph1024)

Zarr store layout

All arrays live at the root of each store (no sub-groups), so xr.open_zarr(store) surfaces every variable — images, timestamps, and all header fields — as a single Dataset aligned on the shared time dimension. Logical grouping of headers is expressed via header_fields and quabo_fields root attributes. Full specification: docs/zarr_v3_spec.md.

Testing

Run the test suite via the built-in CLI:

uv run pypff test all

The test suite includes:

  • Tier 1 (Unit): Basic logic and timing tests.
  • Tier 2 (Logic): Higher-level I/O, slicing, and concurrency tests.
  • Legacy Integration: The original pypff test suite using provided sample data.

Dockerized CI

Build the CI environment:

docker build -t pypff-ci -f src/ci/Dockerfile.ci .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypff-1.0.2.tar.gz (4.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pypff-1.0.2-py3-none-any.whl (48.9 kB view details)

Uploaded Python 3

File details

Details for the file pypff-1.0.2.tar.gz.

File metadata

  • Download URL: pypff-1.0.2.tar.gz
  • Upload date:
  • Size: 4.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pypff-1.0.2.tar.gz
Algorithm Hash digest
SHA256 ec0d1b542424a054b3a2af5372adba87cd5d43045ac76a8b3f9ab2ad1b933162
MD5 5bc166cadb97e340026010da1be2ab06
BLAKE2b-256 5ec753c94c01d1f51013b28e93494df219a05070da42448d44d26ce590878647

See more details on using hashes here.

File details

Details for the file pypff-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: pypff-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 48.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pypff-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 533beaf8591eaee6a0423db6e9ab7532857544615bad561891c567d195376e69
MD5 23b2fa0ad43b6da94cae68e4d3505c72
BLAKE2b-256 f215c77f559e3e493d862502e64082d5602a244d5f40c19de7493f583580967e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page