Skip to main content

Self-Calibrating Probabilistic Peak Finding for Serial X-Ray Crystallography

Project description

probixi - Self-Calibrating (PROB)ab(I)listic Peak Detection for Serial (X)-Ray Crystallograph(I)c Data

Lifecycle: experimental PyPI version PyPI - Python Version PyTorch codecov CUDA Apple Silicon MPS Downloads License: MIT Documentation Status Code Style

probixi proposes that bragg peaks can be found/recovered from a detector image by observing the background noise distributional shape over time, per pixel, and collecting peak candidates from an outlier set. Since this noise model is determined in an unsupervised fashion, the user does not need to tune hyperparameters for finding peaks. We are still testing robustness to different types of data collection (synchrotron, FEL) and random fluence changes, results will be included in this README as they arrive.

Installing the Package

You can install via Pypi with pip:

pip install probixi

Or the latest development version with

pip install git+https://github.com/ryan-odea/probixi.git

Using probixi

probixi can be interacted with either via the command line interface, or through the python API. In it's current implementation, via python, the Probixi API returns iterables, which remain on a GPU tensor via pytorch up until collection - meaning that you can further pass information for any downstream processing. Through the CLI, this is currently a one-stop-shop for peakfinding and indexing. This may change in the future

probixi also has a 'burn-in' phase, where the noise model reaches some stable point, this can be further interrogated with a handy gif.

Via the CLI:

probixi -i files.lst -g myGeometry.geom -p myCell.cell -o stream.stream --device cuda --gif myNoiseModel.gif

Or with python:

import torch

from probixi import Probixi
from probixi.io import DataOffloader

pipeline = Probixi(
    list_file="files.lst",
    geometry_file="myGeometry.geom",
    cell_file="myCell.cell",
    device=torch.device("cuda"),
)

pipeline.noise_diagnostics("myNoiseModel.gif", stop=32)
cal = pipeline.calibrate(n_seed=1636)
print(f"kappa={cal.kappa:.2f}  prior_peak={cal.prior_peak:.4f}  "
      f"threshold={pipeline.threshold_calibration.threshold:.2f}")

# Stream every frame through detect -> index -> predict + integrate. The stream
# is lazy and each result stays on the GPU until you touch it, so you can branch
# off any downstream processing with torch
with DataOffloader(
    "stream.stream",
    geometry=pipeline.geometry,
    cell=pipeline.target_cell,
    geometry_file="myGeometry.geom",
    files=pipeline.metadata.files,
) as off:
    for result in pipeline.index_stream(pipeline.frames(), batch_size=8):
        off.write(result)  # or: pipeline.index_stream(...).to_stream(off)
        print(f"frame {result.frame_index}: "
              f"{result.n_indexed}/{result.n_peaks} indexed (rmsd {result.rmsd:.4f})")

Comparison with other works

Here, we provide a comparison with other peakfinding algorithms with real data. Using a randomly sampled 10,000 frames from experimentally collected data.

Notes:

  1. For wall time, because probixi handles optimizing internal hyperparameters automatically, I have included time used for loose manual hyperparameter tuning on 10% subsamples to find optimal SNR, threshold, and minimum pixels. CPU time for only peakfinding and indexing is bracketed.
  2. Percent agreement is calculated as the (set of crystals indexed by probixi) / (set of crystals indexed by the reference) * 100. Greater than 100 indicates that probixi was able to index more crystals.

Benchmarks were run on:

  • GPU: A100
  • CPU: TODO which CPU do Ra nodes use?

peakfinder8 + indexamajig

Dataset Percent Indexed (probixi) GPU time (probixi) Percent Indexed (peakfinder8+indexamajig) CPU Time (peakfinder8+indexamajig) [No-Tuning] Percent Agreement
Lysozyme-Synchrotron
Lysozyme-FEL
BacterioRhodopsin-Synchrotron
BacterioRhodopsin-FEL
Randomly Dimmed Lysozyme-FEL
Randomly Dimmed BacterioRhodopsin-FEL

pyFAI + TORO

Perhaps a more fair comparison, especially with respect to speed, is pyFAI (azimuthal integration and peak picking) paired with the TORO indexer, which both run on the GPU.

Dataset Percent Indexed (probixi) GPU time (probixi) Percent Indexed (pyFAI+TORO) GPU Time (pyFAI+TORO) [No-Tuning] Percent Agreement
Lysozyme-Synchrotron
Lysozyme-FEL
BacterioRhodopsin-Synchrotron
BacterioRhodopsin-FEL
Randomly Dimmed Lysozyme-FEL
Randomly Dimmed BacterioRhodopsin-FEL

Using probixi as only a peakfinder

Of course, if you only want to use probixi as a peakfinder and prefer to use your own indexing regime, this is possible -- through the CLI's --peaks-only flag or the Python API's peak_stream.

Via the CLI:

probixi -i files.lst -g myGeometry.geom -o peaks.stream --peaks-only --device cuda

Or with python:

import torch

from probixi import Probixi
from probixi.io import PeakOffloader

pipeline = Probixi(
    list_file="files.lst",
    geometry_file="myGeometry.geom",
    device=torch.device("cuda"),
)

# Calibrate the noise model + detection threshold on the seed frames, as usual.
pipeline.calibrate(n_seed=1636)

peaks = pipeline.peak_stream(pipeline.frames(), estimate_scale=False)
with PeakOffloader(
    "peaks.stream",
    geometry=pipeline.geometry,
    geometry_file="myGeometry.geom",
    files=pipeline.metadata.files,
) as off:
    for result in peaks:
        if len(result):       # skip blanks; export only frames with peaks
            off.write(result)

Dependencies

  • python >= 3.9
    • click
    • h5py
    • hdf5plugin
    • numpy
    • torch
    • matplotlib
    • pillow

Contributing

There are many different ways to contribute to further development of this tool. If you experience a bug or would like an additional feature, please open up a ticket.

If you would like to contribute actively by merging code, please open a PR with the following:

  1. Code is formatted with isort, then black, followed by a ruff --check. This will initiate on PR, so it might be best to check beforehand.
  2. Docstrings are minimally on user-facing functions in numpy style.
  3. Comments, or some explanation (in PR) for the additions, limited to the scope of the project. If fixing a bug, comments should be included in the PR rather than the code itself.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probixi-0.1.0.tar.gz (94.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

probixi-0.1.0-py3-none-any.whl (80.4 kB view details)

Uploaded Python 3

File details

Details for the file probixi-0.1.0.tar.gz.

File metadata

  • Download URL: probixi-0.1.0.tar.gz
  • Upload date:
  • Size: 94.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for probixi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3e6b3786eeacc3e6e3618838cb8de0c2d465015a02f21e50aeae99faa6519d0e
MD5 ba0fe3ba5f2ff387ab7090f49c47d2ea
BLAKE2b-256 71df6c15b6736d9555de8d9221170de6dd74d538aafe480bf4320bbfc98a98ef

See more details on using hashes here.

Provenance

The following attestation bundles were made for probixi-0.1.0.tar.gz:

Publisher: publish.yml on ryan-odea/probixi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file probixi-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: probixi-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 80.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for probixi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1ec882fcf20635f0a9a533ddeba3c161c976065dc5aec3c5f750daa329361bb
MD5 8b8fac972858cf84367d6ec2bb0a0d9d
BLAKE2b-256 1d166d79192925d969151dc1756405b29410ae58bbecf3687890b712b4e5197f

See more details on using hashes here.

Provenance

The following attestation bundles were made for probixi-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ryan-odea/probixi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page