Skip to main content

Python PANOSETI File Format (PFF) I/O Library

Project description

pypff - Python PANOSETI File Format I/O Library

pypff-CI Version Python Coverage License

A high-performance Python package for reading and analyzing data files generated by PanoSETI (PanoSETI File Format - PFF).

Features

  • Streaming by default: iter_batches(size=256) and __iter__ yield zero-copy views without materializing the full sequence into RAM — safe for Jupyter and HPC pipelines alike.
  • Distributed chunked reads: iter_byte_range(file_idx, byte_start, byte_end) lets Dask/Nextflow workers parse frame-aligned byte ranges in parallel without coordination.
  • Zero-copy random access: seq[i] returns a strided mmap view; slicing and read_images(indices) use a sort + inverse-permutation for disk locality.
  • Single-pass metadata: get_metadata_arrays(keys) extracts any number of header fields in one np.frombuffer pass per file via a composite NumPy structured dtype. Supports virtual unix_t_ns key.
  • Nanosecond-precise timestamps: timestamps() returns int64 ns (no float precision loss); timestamps(as_datetime=True) returns a zero-copy datetime64[ns] view for matplotlib and pandas.
  • PFF → Zarr v3 conversion: pypff[zarr] optional extra converts any .pffd run to Zarr v3 stores readable by xarray, dask, numpy, TensorStore, and Julia — lossless, compressed, HPC/ML-ready. See Zarr v3 spec.
  • Bounded resources: LRU mmap handle cache (default 16 files); PFFSequence is a context manager.
  • Multiprocessing-safe: pickle-compatible — file handles are dropped on serialisation and lazily reopened in workers.
  • Pydantic validation: strict schema validation for all PFF headers and PANOSETI config files.
  • Run discovery: PanosetiRun lazily scans a .pffd directory and exposes typed configs, housekeeping, and data products.

Installation

The package uses uv for dependency management.

cd pypff
uv sync                   # core library
uv sync --extra zarr      # + Zarr v3 conversion (zarr-python, xarray, dask)

Quick Start

Run discovery

from pypff import PanosetiRun

run = PanosetiRun("path/to/run_directory.pffd")
run.show()                          # pretty-print structure

print(run.list_products())          # ['dp_img16.bpp_2.module_1', ...]
seq = run.get_product("dp_img16.bpp_2.module_1")

obs_cfg  = run.get_config("obs_config")   # returns Pydantic model
data_cfg = run.get_config("data_config")
hk       = run.get_hk()                   # dict[device, dict[field, np.ndarray]]

Streaming (memory-bounded)

# Iterate frame-by-frame (zero-copy views into mmap)
for img in seq:
    process(img)

# Batch iteration — never holds more than one batch in RAM
for batch in seq.iter_batches(size=256):
    batch  # shape (256, H, W), dtype matches file

# With timestamps and headers in one pass
for batch, ts in seq.iter_batches(size=256, with_timestamps=True):
    ts  # int64 nanoseconds, shape (256,)

# Distributed byte-range reads (Dask / Nextflow workers)
for batch in seq.iter_byte_range(file_idx=0, byte_start=0, byte_end=size):
    ...

Random access and slicing

img   = seq[42]            # zero-copy view, shape (H, W)
imgs  = seq[0:100:2]       # every other frame, shape (50, H, W)
imgs  = seq.read_images(np.array([5, 0, 3]))   # unsorted — sorted internally for locality
imgs  = seq.read_images_range(start=0, count=500)

Timestamps — always int64 nanoseconds

# Full cached array — int64 ns, no float precision loss
ts = seq.timestamps()                      # np.ndarray[int64]

# Zero-copy datetime64[ns] view for matplotlib / pandas
ts_dt = seq.timestamps(as_datetime=True)   # np.ndarray[datetime64[ns]]

# Indexed subset
ts_sub = seq.timestamps(indices=np.arange(0, len(seq), 100))

# Single frame
t_ns = seq.timestamp_at(42)               # int, nanoseconds

# Time-based navigation
idx = seq.seek_time(t_ns + 1_000_000_000)  # 1 s later

# Arithmetic rule: subtract epoch in integer space before dividing
rel_s = (ts - ts[0]) / 1e9               # CORRECT — small values, no precision loss
# ts / 1e9 - t0_s                        # WRONG  — loses ns precision at ~1.7e9 s

Metadata extraction (single pass)

# One np.frombuffer call per file regardless of number of keys
meta = seq.get_metadata_arrays(["pkt_num", "tv_sec", "unix_t_ns"])
meta["unix_t_ns"]   # int64 ns, same as timestamps()
meta["pkt_num"]     # int64

# All fields at once
all_meta = seq.get_all_metadata()

Headers and Pydantic validation

# Fast path — raw dict, no Pydantic overhead
header, img = seq.get_frame(0)
header["pkt_num"]                 # quabo file
header["quabo_0"]["pkt_tai"]      # module file

# Validated path — full Pydantic model
from pypff.models import ModuleHeader
hdr, img = seq.get_frame_validated(0)
assert isinstance(hdr, ModuleHeader)

Context manager and multiprocessing

# Deterministic handle cleanup
with run.get_product("dp_ph256.bpp_2.module_254") as seq:
    data = seq.read_images_range(0, 1000)

# PFFSequence is pickle-safe — handles are dropped on serialisation
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as ex:
    futures = [ex.submit(lambda s, i: s[i].sum(), seq, i) for i in range(len(seq))]

PFF → Zarr v3 conversion (pypff[zarr])

Convert a .pffd observation run to Zarr v3 stores readable by xarray, dask, numpy, TensorStore, Julia, and Rust. See docs/zarr_v3_spec.md for the full layout specification.

Write

from pypff.io2 import PanosetiRun
from pypff.zarr import convert_run

run = PanosetiRun("path/to/obs.pffd")
stores = convert_run(run, "output/L0_zarr")
# → one .zarr per (data_product, module)
# → one .panoseti-meta/ sidecar bundle (configs, logs, hk.pff)

Or via the CLI:

uv run pypff zarr path/to/obs.pffd output/L0_zarr

Read

from pypff.zarr import PanosetiZarrRun
import xarray as xr

# High-level wrapper (mirrors PanosetiRun)
zrun = PanosetiZarrRun("output/L0_zarr")
store = zrun.get_product("dp_ph256.bpp_2.module_254")
ts    = store.timestamps()           # int64 ns array
ds    = store.to_dataset()           # xarray.Dataset
cfgs  = zrun.configs                 # parsed run configs

# Or open directly with xarray — all arrays visible as variables
ds = xr.open_zarr(str(stores[0]), consolidated=False)
# ds.images      (T, H, W)   int16 / uint16
# ds.unix_t_ns   (T,)        int64  — nanosecond timestamps
# ds.pkt_num     (T,)        uint32 ─┐ per-frame header fields
# ds.pkt_nsec    (T,)        uint32  │ (single-level: ph256)
# ds.quabo_num   (T,)        uint8  ─┘
# ds.quabo_0_pkt_num  (T,)   uint32 ─┐ module-level headers
# …                                  ─┘ (img16, ph1024)

Zarr store layout

All arrays live at the root of each store (no sub-groups), so xr.open_zarr(store) surfaces every variable — images, timestamps, and all header fields — as a single Dataset aligned on the shared time dimension. Logical grouping of headers is expressed via header_fields and quabo_fields root attributes. Full specification: docs/zarr_v3_spec.md.

Testing

Run the test suite via the built-in CLI:

uv run pypff test all

The test suite includes:

  • Tier 1 (Unit): Basic logic and timing tests.
  • Tier 2 (Logic): Higher-level I/O, slicing, and concurrency tests.
  • Legacy Integration: The original pypff test suite using provided sample data.

Dockerized CI

Build the CI environment:

docker build -t pypff-ci -f src/ci/Dockerfile.ci .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypff-1.0.1.tar.gz (4.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pypff-1.0.1-py3-none-any.whl (48.9 kB view details)

Uploaded Python 3

File details

Details for the file pypff-1.0.1.tar.gz.

File metadata

  • Download URL: pypff-1.0.1.tar.gz
  • Upload date:
  • Size: 4.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pypff-1.0.1.tar.gz
Algorithm Hash digest
SHA256 454dba30bbb0a8abc5599152f747d73fa4c49fc678e33f3e4ca97c2347fb5673
MD5 7e35d22832998f79024d2f7e8b7ca040
BLAKE2b-256 67848194670f44a8decf5b1398ba530fcf8e827f742146c27b08c58b098b6c47

See more details on using hashes here.

File details

Details for the file pypff-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pypff-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 48.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pypff-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1cb6b9a00e0e297a8eb1f4e12f708562324236cd320983d8f05d63bc4a683656
MD5 336c0c2221fd40725f006aaf59b0a97a
BLAKE2b-256 ea3955529d3c5f16bb05d346e4560f34b655216a775f4be47223af7550418847

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page