Skip to main content

High-performance PanoSETI File Format (PFF) I/O library

Project description

pypff - High-performance PanoSETI I/O Library

pypff-CI Version Python Coverage License

A high-performance Python package for reading and analyzing data files generated by PanoSETI (PanoSETI File Format - PFF).

Features

  • Streaming by default: iter_batches(size=256) and __iter__ yield zero-copy views without materializing the full sequence into RAM — safe for Jupyter and HPC pipelines alike.
  • Distributed chunked reads: iter_byte_range(file_idx, byte_start, byte_end) lets Dask/Nextflow workers parse frame-aligned byte ranges in parallel without coordination.
  • Zero-copy random access: seq[i] returns a strided mmap view; slicing and read_images(indices) use a sort + inverse-permutation for disk locality.
  • Single-pass metadata: get_metadata_arrays(keys) extracts any number of header fields in one np.frombuffer pass per file via a composite NumPy structured dtype. Supports virtual unix_t_ns key.
  • Nanosecond-precise timestamps: timestamps() returns int64 ns (no float precision loss); timestamps(as_datetime=True) returns a zero-copy datetime64[ns] view for matplotlib and pandas.
  • PFF → Zarr v3 conversion: pypff[zarr] optional extra converts any .pffd run to Zarr v3 stores readable by xarray, dask, numpy, TensorStore, and Julia — lossless, compressed, HPC/ML-ready. See Zarr v3 spec.
  • Bounded resources: LRU mmap handle cache (default 16 files); PFFSequence is a context manager.
  • Multiprocessing-safe: pickle-compatible — file handles are dropped on serialisation and lazily reopened in workers.
  • Pydantic validation: strict schema validation for all PFF headers and PANOSETI config files.
  • Run discovery: PanosetiRun lazily scans a .pffd directory and exposes typed configs, housekeeping, and data products.

Installation

The package uses uv for dependency management.

cd pypff
uv sync                   # core library
uv sync --extra zarr      # + Zarr v3 conversion (zarr-python, xarray, dask)

Quick Start

Run discovery

from pypff import PanosetiRun

run = PanosetiRun("path/to/run_directory.pffd")
run.show()                          # pretty-print structure

print(run.list_products())          # ['dp_img16.bpp_2.module_1', ...]
seq = run.get_product("dp_img16.bpp_2.module_1")

obs_cfg  = run.get_config("obs_config")   # returns Pydantic model
data_cfg = run.get_config("data_config")
hk       = run.get_hk()                   # dict[device, dict[field, np.ndarray]]

Streaming (memory-bounded)

# Iterate frame-by-frame (zero-copy views into mmap)
for img in seq:
    process(img)

# Batch iteration — never holds more than one batch in RAM
for batch in seq.iter_batches(size=256):
    batch  # shape (256, H, W), dtype matches file

# With timestamps and headers in one pass
for batch, ts in seq.iter_batches(size=256, with_timestamps=True):
    ts  # int64 nanoseconds, shape (256,)

# Distributed byte-range reads (Dask / Nextflow workers)
for batch in seq.iter_byte_range(file_idx=0, byte_start=0, byte_end=size):
    ...

Random access and slicing

img   = seq[42]            # zero-copy view, shape (H, W)
imgs  = seq[0:100:2]       # every other frame, shape (50, H, W)
imgs  = seq.read_images(np.array([5, 0, 3]))   # unsorted — sorted internally for locality
imgs  = seq.read_images_range(start=0, count=500)

Timestamps — always int64 nanoseconds

# Full cached array — int64 ns, no float precision loss
ts = seq.timestamps()                      # np.ndarray[int64]

# Zero-copy datetime64[ns] view for matplotlib / pandas
ts_dt = seq.timestamps(as_datetime=True)   # np.ndarray[datetime64[ns]]

# Indexed subset
ts_sub = seq.timestamps(indices=np.arange(0, len(seq), 100))

# Single frame
t_ns = seq.timestamp_at(42)               # int, nanoseconds

# Time-based navigation
idx = seq.seek_time(t_ns + 1_000_000_000)  # 1 s later

# Arithmetic rule: subtract epoch in integer space before dividing
rel_s = (ts - ts[0]) / 1e9               # CORRECT — small values, no precision loss
# ts / 1e9 - t0_s                        # WRONG  — loses ns precision at ~1.7e9 s

Metadata extraction (single pass)

# One np.frombuffer call per file regardless of number of keys
meta = seq.get_metadata_arrays(["pkt_num", "tv_sec", "unix_t_ns"])
meta["unix_t_ns"]   # int64 ns, same as timestamps()
meta["pkt_num"]     # int64

# All fields at once
all_meta = seq.get_all_metadata()

Headers and Pydantic validation

# Fast path — raw dict, no Pydantic overhead
header, img = seq.get_frame(0)
header["pkt_num"]                 # quabo file
header["quabo_0"]["pkt_tai"]      # module file

# Validated path — full Pydantic model
from pypff.models import ModuleHeader
hdr, img = seq.get_frame_validated(0)
assert isinstance(hdr, ModuleHeader)

Context manager and multiprocessing

# Deterministic handle cleanup
with run.get_product("dp_ph256.bpp_2.module_254") as seq:
    data = seq.read_images_range(0, 1000)

# PFFSequence is pickle-safe — handles are dropped on serialisation
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as ex:
    futures = [ex.submit(lambda s, i: s[i].sum(), seq, i) for i in range(len(seq))]

PFF → Zarr v3 conversion (pypff[zarr])

Convert a .pffd observation run to Zarr v3 stores readable by xarray, dask, numpy, TensorStore, Julia, and Rust. See docs/zarr_v3_spec.md for the full layout specification.

Write

from pypff.io2 import PanosetiRun
from pypff.zarr import convert_run

run = PanosetiRun("path/to/obs.pffd")
stores = convert_run(run, "output/L0_zarr")
# → one .zarr per (data_product, module)
# → one .panoseti-meta/ sidecar bundle (configs, logs, hk.pff)

Or via the CLI:

uv run pypff zarr path/to/obs.pffd output/L0_zarr

Read

from pypff.zarr import PanosetiZarrRun
import xarray as xr

# High-level wrapper (mirrors PanosetiRun)
zrun = PanosetiZarrRun("output/L0_zarr")
store = zrun.get_product("dp_ph256.bpp_2.module_254")
ts    = store.timestamps()           # int64 ns array
ds    = store.to_dataset()           # xarray.Dataset
cfgs  = zrun.configs                 # parsed run configs

# Or open directly with xarray — all arrays visible as variables
ds = xr.open_zarr(str(stores[0]), consolidated=False)
# ds.images      (T, H, W)   int16 / uint16
# ds.unix_t_ns   (T,)        int64  — nanosecond timestamps
# ds.pkt_num     (T,)        uint32 ─┐ per-frame header fields
# ds.pkt_nsec    (T,)        uint32  │ (single-level: ph256)
# ds.quabo_num   (T,)        uint8  ─┘
# ds.quabo_0_pkt_num  (T,)   uint32 ─┐ module-level headers
# …                                  ─┘ (img16, ph1024)

Zarr store layout

All arrays live at the root of each store (no sub-groups), so xr.open_zarr(store) surfaces every variable — images, timestamps, and all header fields — as a single Dataset aligned on the shared time dimension. Logical grouping of headers is expressed via header_fields and quabo_fields root attributes. Full specification: docs/zarr_v3_spec.md.

Testing

Run the test suite via the built-in CLI:

uv run pypff test all

The test suite includes:

  • Tier 1 (Unit): Basic logic and timing tests.
  • Tier 2 (Logic): Higher-level I/O, slicing, and concurrency tests.
  • Legacy Integration: The original pypff test suite using provided sample data.

Dockerized CI

Build the CI environment:

docker build -t pypff-ci -f src/ci/Dockerfile.ci .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypff-1.0.0.tar.gz (4.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pypff-1.0.0-py3-none-any.whl (48.6 kB view details)

Uploaded Python 3

File details

Details for the file pypff-1.0.0.tar.gz.

File metadata

  • Download URL: pypff-1.0.0.tar.gz
  • Upload date:
  • Size: 4.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pypff-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a38aadce2cd5d06b28553ce022f2bd025db2782a91e8a2ad0318afae50c6e500
MD5 573c9c5a5d52c33881a53fd10fa9d1b1
BLAKE2b-256 fb4bd0f0969f8517e08b7f67aab51c30130541b70947b8939fcbcb0573d791e1

See more details on using hashes here.

File details

Details for the file pypff-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pypff-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 48.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pypff-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd95d713e886dd004511eff37f36d7738b1fddb145485db81f382455d31003dc
MD5 436cd9832ce67ef8cca47d7df24f4dee
BLAKE2b-256 73fb8020bb97b5eee0eb8bb75a805feb1b30d524ddba0d718d34d0096fdd23e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page