Python PANOSETI File Format (PFF) I/O Library
Project description
pypff - Python PANOSETI File Format I/O Library
A high-performance Python package for reading and analyzing data files generated by PanoSETI (PanoSETI File Format - PFF).
Features
- Streaming by default:
iter_batches(size=256)and__iter__yield zero-copy views without materializing the full sequence into RAM — safe for Jupyter and HPC pipelines alike. - Distributed chunked reads:
iter_byte_range(file_idx, byte_start, byte_end)lets Dask/Nextflow workers parse frame-aligned byte ranges in parallel without coordination. - Zero-copy random access:
seq[i]returns a strided mmap view; slicing andread_images(indices)use a sort + inverse-permutation for disk locality. - Single-pass metadata:
get_metadata_arrays(keys)extracts any number of header fields in onenp.frombufferpass per file via a composite NumPy structured dtype. Supports virtualunix_t_nskey. - Nanosecond-precise timestamps:
timestamps()returnsint64ns (no float precision loss);timestamps(as_datetime=True)returns a zero-copydatetime64[ns]view for matplotlib and pandas. - PFF → Zarr v3 conversion:
pypff[zarr]optional extra converts any.pffdrun to Zarr v3 stores readable by xarray, dask, numpy, TensorStore, and Julia — lossless, compressed, HPC/ML-ready. See Zarr v3 spec. - Bounded resources: LRU mmap handle cache (default 16 files);
PFFSequenceis a context manager. - Multiprocessing-safe: pickle-compatible — file handles are dropped on serialisation and lazily reopened in workers.
- Pydantic validation: strict schema validation for all PFF headers and PANOSETI config files.
- Run discovery:
PanosetiRunlazily scans a.pffddirectory and exposes typed configs, housekeeping, and data products.
Installation
The package uses uv for dependency management.
cd pypff
uv sync # core library
uv sync --extra zarr # + Zarr v3 conversion (zarr-python, xarray, dask)
Quick Start
Run discovery
from pypff import PanosetiRun
run = PanosetiRun("path/to/run_directory.pffd")
run.show() # pretty-print structure
print(run.list_products()) # ['dp_img16.bpp_2.module_1', ...]
seq = run.get_product("dp_img16.bpp_2.module_1")
obs_cfg = run.get_config("obs_config") # returns Pydantic model
data_cfg = run.get_config("data_config")
hk = run.get_hk() # dict[device, dict[field, np.ndarray]]
Streaming (memory-bounded)
# Iterate frame-by-frame (zero-copy views into mmap)
for img in seq:
process(img)
# Batch iteration — never holds more than one batch in RAM
for batch in seq.iter_batches(size=256):
batch # shape (256, H, W), dtype matches file
# With timestamps and headers in one pass
for batch, ts in seq.iter_batches(size=256, with_timestamps=True):
ts # int64 nanoseconds, shape (256,)
# Distributed byte-range reads (Dask / Nextflow workers)
for batch in seq.iter_byte_range(file_idx=0, byte_start=0, byte_end=size):
...
Random access and slicing
img = seq[42] # zero-copy view, shape (H, W)
imgs = seq[0:100:2] # every other frame, shape (50, H, W)
imgs = seq.read_images(np.array([5, 0, 3])) # unsorted — sorted internally for locality
imgs = seq.read_images_range(start=0, count=500)
Timestamps — always int64 nanoseconds
# Full cached array — int64 ns, no float precision loss
ts = seq.timestamps() # np.ndarray[int64]
# Zero-copy datetime64[ns] view for matplotlib / pandas
ts_dt = seq.timestamps(as_datetime=True) # np.ndarray[datetime64[ns]]
# Indexed subset
ts_sub = seq.timestamps(indices=np.arange(0, len(seq), 100))
# Single frame
t_ns = seq.timestamp_at(42) # int, nanoseconds
# Time-based navigation
idx = seq.seek_time(t_ns + 1_000_000_000) # 1 s later
# Arithmetic rule: subtract epoch in integer space before dividing
rel_s = (ts - ts[0]) / 1e9 # CORRECT — small values, no precision loss
# ts / 1e9 - t0_s # WRONG — loses ns precision at ~1.7e9 s
Metadata extraction (single pass)
# One np.frombuffer call per file regardless of number of keys
meta = seq.get_metadata_arrays(["pkt_num", "tv_sec", "unix_t_ns"])
meta["unix_t_ns"] # int64 ns, same as timestamps()
meta["pkt_num"] # int64
# All fields at once
all_meta = seq.get_all_metadata()
Headers and Pydantic validation
# Fast path — raw dict, no Pydantic overhead
header, img = seq.get_frame(0)
header["pkt_num"] # quabo file
header["quabo_0"]["pkt_tai"] # module file
# Validated path — full Pydantic model
from pypff.models import ModuleHeader
hdr, img = seq.get_frame_validated(0)
assert isinstance(hdr, ModuleHeader)
Context manager and multiprocessing
# Deterministic handle cleanup
with run.get_product("dp_ph256.bpp_2.module_254") as seq:
data = seq.read_images_range(0, 1000)
# PFFSequence is pickle-safe — handles are dropped on serialisation
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as ex:
futures = [ex.submit(lambda s, i: s[i].sum(), seq, i) for i in range(len(seq))]
PFF → Zarr v3 conversion (pypff[zarr])
Convert a .pffd observation run to Zarr v3 stores readable by xarray, dask,
numpy, TensorStore, Julia, and Rust. See docs/zarr_v3_spec.md
for the full layout specification.
Write
from pypff.io2 import PanosetiRun
from pypff.zarr import convert_run
run = PanosetiRun("path/to/obs.pffd")
stores = convert_run(run, "output/L0_zarr")
# → one .zarr per (data_product, module)
# → one .panoseti-meta/ sidecar bundle (configs, logs, hk.pff)
Or via the CLI:
uv run pypff zarr path/to/obs.pffd output/L0_zarr
Read
from pypff.zarr import PanosetiZarrRun
import xarray as xr
# High-level wrapper (mirrors PanosetiRun)
zrun = PanosetiZarrRun("output/L0_zarr")
store = zrun.get_product("dp_ph256.bpp_2.module_254")
ts = store.timestamps() # int64 ns array
ds = store.to_dataset() # xarray.Dataset
cfgs = zrun.configs # parsed run configs
# Or open directly with xarray — all arrays visible as variables
ds = xr.open_zarr(str(stores[0]), consolidated=False)
# ds.images (T, H, W) int16 / uint16
# ds.unix_t_ns (T,) int64 — nanosecond timestamps
# ds.pkt_num (T,) uint32 ─┐ per-frame header fields
# ds.pkt_nsec (T,) uint32 │ (single-level: ph256)
# ds.quabo_num (T,) uint8 ─┘
# ds.quabo_0_pkt_num (T,) uint32 ─┐ module-level headers
# … ─┘ (img16, ph1024)
Zarr store layout
All arrays live at the root of each store (no sub-groups), so
xr.open_zarr(store) surfaces every variable — images, timestamps, and all
header fields — as a single Dataset aligned on the shared time dimension.
Logical grouping of headers is expressed via header_fields and quabo_fields
root attributes. Full specification: docs/zarr_v3_spec.md.
Testing
Run the test suite via the built-in CLI:
uv run pypff test all
The test suite includes:
- Tier 1 (Unit): Basic logic and timing tests.
- Tier 2 (Logic): Higher-level I/O, slicing, and concurrency tests.
- Legacy Integration: The original
pypfftest suite using provided sample data.
Dockerized CI
Build the CI environment:
docker build -t pypff-ci -f src/ci/Dockerfile.ci .
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pypff-1.0.1.tar.gz.
File metadata
- Download URL: pypff-1.0.1.tar.gz
- Upload date:
- Size: 4.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
454dba30bbb0a8abc5599152f747d73fa4c49fc678e33f3e4ca97c2347fb5673
|
|
| MD5 |
7e35d22832998f79024d2f7e8b7ca040
|
|
| BLAKE2b-256 |
67848194670f44a8decf5b1398ba530fcf8e827f742146c27b08c58b098b6c47
|
File details
Details for the file pypff-1.0.1-py3-none-any.whl.
File metadata
- Download URL: pypff-1.0.1-py3-none-any.whl
- Upload date:
- Size: 48.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1cb6b9a00e0e297a8eb1f4e12f708562324236cd320983d8f05d63bc4a683656
|
|
| MD5 |
336c0c2221fd40725f006aaf59b0a97a
|
|
| BLAKE2b-256 |
ea3955529d3c5f16bb05d346e4560f34b655216a775f4be47223af7550418847
|