High-performance PanoSETI File Format (PFF) I/O library

These details have not been verified by PyPI

Project links

Project description

pypff - High-performance PanoSETI I/O Library

A high-performance Python package for reading and analyzing data files generated by PanoSETI (PanoSETI File Format - PFF).

Features

Streaming by default: iter_batches(size=256) and __iter__ yield zero-copy views without materializing the full sequence into RAM — safe for Jupyter and HPC pipelines alike.
Distributed chunked reads: iter_byte_range(file_idx, byte_start, byte_end) lets Dask/Nextflow workers parse frame-aligned byte ranges in parallel without coordination.
Zero-copy random access: seq[i] returns a strided mmap view; slicing and read_images(indices) use a sort + inverse-permutation for disk locality.
Single-pass metadata: get_metadata_arrays(keys) extracts any number of header fields in one np.frombuffer pass per file via a composite NumPy structured dtype. Supports virtual unix_t_ns key.
Nanosecond-precise timestamps: timestamps() returns int64 ns (no float precision loss); timestamps(as_datetime=True) returns a zero-copy datetime64[ns] view for matplotlib and pandas.
PFF → Zarr v3 conversion: pypff[zarr] optional extra converts any .pffd run to Zarr v3 stores readable by xarray, dask, numpy, TensorStore, and Julia — lossless, compressed, HPC/ML-ready. See Zarr v3 spec.
Bounded resources: LRU mmap handle cache (default 16 files); PFFSequence is a context manager.
Multiprocessing-safe: pickle-compatible — file handles are dropped on serialisation and lazily reopened in workers.
Pydantic validation: strict schema validation for all PFF headers and PANOSETI config files.
Run discovery: PanosetiRun lazily scans a .pffd directory and exposes typed configs, housekeeping, and data products.

Installation

The package uses uv for dependency management.

cd pypff
uv sync                   # core library
uv sync --extra zarr      # + Zarr v3 conversion (zarr-python, xarray, dask)

Quick Start

Run discovery

from pypff import PanosetiRun

run = PanosetiRun("path/to/run_directory.pffd")
run.show()                          # pretty-print structure

print(run.list_products())          # ['dp_img16.bpp_2.module_1', ...]
seq = run.get_product("dp_img16.bpp_2.module_1")

obs_cfg  = run.get_config("obs_config")   # returns Pydantic model
data_cfg = run.get_config("data_config")
hk       = run.get_hk()                   # dict[device, dict[field, np.ndarray]]

Streaming (memory-bounded)

# Iterate frame-by-frame (zero-copy views into mmap)
for img in seq:
    process(img)

# Batch iteration — never holds more than one batch in RAM
for batch in seq.iter_batches(size=256):
    batch  # shape (256, H, W), dtype matches file

# With timestamps and headers in one pass
for batch, ts in seq.iter_batches(size=256, with_timestamps=True):
    ts  # int64 nanoseconds, shape (256,)

# Distributed byte-range reads (Dask / Nextflow workers)
for batch in seq.iter_byte_range(file_idx=0, byte_start=0, byte_end=size):
    ...

Random access and slicing

img   = seq[42]            # zero-copy view, shape (H, W)
imgs  = seq[0:100:2]       # every other frame, shape (50, H, W)
imgs  = seq.read_images(np.array([5, 0, 3]))   # unsorted — sorted internally for locality
imgs  = seq.read_images_range(start=0, count=500)

Timestamps — always `int64` nanoseconds

# Full cached array — int64 ns, no float precision loss
ts = seq.timestamps()                      # np.ndarray[int64]

# Zero-copy datetime64[ns] view for matplotlib / pandas
ts_dt = seq.timestamps(as_datetime=True)   # np.ndarray[datetime64[ns]]

# Indexed subset
ts_sub = seq.timestamps(indices=np.arange(0, len(seq), 100))

# Single frame
t_ns = seq.timestamp_at(42)               # int, nanoseconds

# Time-based navigation
idx = seq.seek_time(t_ns + 1_000_000_000)  # 1 s later

# Arithmetic rule: subtract epoch in integer space before dividing
rel_s = (ts - ts[0]) / 1e9               # CORRECT — small values, no precision loss
# ts / 1e9 - t0_s                        # WRONG  — loses ns precision at ~1.7e9 s

Metadata extraction (single pass)

# One np.frombuffer call per file regardless of number of keys
meta = seq.get_metadata_arrays(["pkt_num", "tv_sec", "unix_t_ns"])
meta["unix_t_ns"]   # int64 ns, same as timestamps()
meta["pkt_num"]     # int64

# All fields at once
all_meta = seq.get_all_metadata()

Headers and Pydantic validation

# Fast path — raw dict, no Pydantic overhead
header, img = seq.get_frame(0)
header["pkt_num"]                 # quabo file
header["quabo_0"]["pkt_tai"]      # module file

# Validated path — full Pydantic model
from pypff.models import ModuleHeader
hdr, img = seq.get_frame_validated(0)
assert isinstance(hdr, ModuleHeader)

Context manager and multiprocessing

# Deterministic handle cleanup
with run.get_product("dp_ph256.bpp_2.module_254") as seq:
    data = seq.read_images_range(0, 1000)

# PFFSequence is pickle-safe — handles are dropped on serialisation
import concurrent.futures
with concurrent.futures.ProcessPoolExecutor() as ex:
    futures = [ex.submit(lambda s, i: s[i].sum(), seq, i) for i in range(len(seq))]

PFF → Zarr v3 conversion (`pypff[zarr]`)

Convert a .pffd observation run to Zarr v3 stores readable by xarray, dask, numpy, TensorStore, Julia, and Rust. See docs/zarr_v3_spec.md for the full layout specification.

Write

from pypff.io2 import PanosetiRun
from pypff.zarr import convert_run

run = PanosetiRun("path/to/obs.pffd")
stores = convert_run(run, "output/L0_zarr")
# → one .zarr per (data_product, module)
# → one .panoseti-meta/ sidecar bundle (configs, logs, hk.pff)

Or via the CLI:

uv run pypff zarr path/to/obs.pffd output/L0_zarr

Read

from pypff.zarr import PanosetiZarrRun
import xarray as xr

# High-level wrapper (mirrors PanosetiRun)
zrun = PanosetiZarrRun("output/L0_zarr")
store = zrun.get_product("dp_ph256.bpp_2.module_254")
ts    = store.timestamps()           # int64 ns array
ds    = store.to_dataset()           # xarray.Dataset
cfgs  = zrun.configs                 # parsed run configs

# Or open directly with xarray — all arrays visible as variables
ds = xr.open_zarr(str(stores[0]), consolidated=False)
# ds.images      (T, H, W)   int16 / uint16
# ds.unix_t_ns   (T,)        int64  — nanosecond timestamps
# ds.pkt_num     (T,)        uint32 ─┐ per-frame header fields
# ds.pkt_nsec    (T,)        uint32  │ (single-level: ph256)
# ds.quabo_num   (T,)        uint8  ─┘
# ds.quabo_0_pkt_num  (T,)   uint32 ─┐ module-level headers
# …                                  ─┘ (img16, ph1024)

Zarr store layout

All arrays live at the root of each store (no sub-groups), so xr.open_zarr(store) surfaces every variable — images, timestamps, and all header fields — as a single Dataset aligned on the shared time dimension. Logical grouping of headers is expressed via header_fields and quabo_fields root attributes. Full specification: docs/zarr_v3_spec.md.

Testing

Run the test suite via the built-in CLI:

uv run pypff test all

The test suite includes:

Tier 1 (Unit): Basic logic and timing tests.
Tier 2 (Logic): Higher-level I/O, slicing, and concurrency tests.
Legacy Integration: The original pypff test suite using provided sample data.

Dockerized CI

Build the CI environment:

docker build -t pypff-ci -f src/ci/Dockerfile.ci .

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.2

May 14, 2026

1.0.1

May 14, 2026

This version

1.0.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypff-1.0.0.tar.gz (4.8 MB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pypff-1.0.0-py3-none-any.whl (48.6 kB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file pypff-1.0.0.tar.gz.

File metadata

Download URL: pypff-1.0.0.tar.gz
Upload date: May 14, 2026
Size: 4.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pypff-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a38aadce2cd5d06b28553ce022f2bd025db2782a91e8a2ad0318afae50c6e500`
MD5	`573c9c5a5d52c33881a53fd10fa9d1b1`
BLAKE2b-256	`fb4bd0f0969f8517e08b7f67aab51c30130541b70947b8939fcbcb0573d791e1`

See more details on using hashes here.

File details

Details for the file pypff-1.0.0-py3-none-any.whl.

File metadata

Download URL: pypff-1.0.0-py3-none-any.whl
Upload date: May 14, 2026
Size: 48.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pypff-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bd95d713e886dd004511eff37f36d7738b1fddb145485db81f382455d31003dc`
MD5	`436cd9832ce67ef8cca47d7df24f4dee`
BLAKE2b-256	`73fb8020bb97b5eee0eb8bb75a805feb1b30d524ddba0d718d34d0096fdd23e1`

See more details on using hashes here.

pypff 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pypff - High-performance PanoSETI I/O Library

Features

Installation

Quick Start

Run discovery

Streaming (memory-bounded)

Random access and slicing

Timestamps — always int64 nanoseconds

Metadata extraction (single pass)

Headers and Pydantic validation

Context manager and multiprocessing

PFF → Zarr v3 conversion (pypff[zarr])

Write

Read

Zarr store layout

Testing

Dockerized CI

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Timestamps — always `int64` nanoseconds

PFF → Zarr v3 conversion (`pypff[zarr]`)