Self-Calibrating Probabilistic Peak Finding for Serial X-Ray Crystallography
Project description
probixi - Self-Calibrating (PROB)ab(I)listic Peak Detection for Serial (X)-Ray Crystallograph(I)c Data
probixi proposes that bragg peaks can be found/recovered from a detector image by observing the background noise distributional shape over time, per pixel, and collecting peak candidates from an outlier set. Since this noise model is determined in an unsupervised fashion, the user does not need to tune hyperparameters for finding peaks. We are still testing robustness to different types of data collection (synchrotron, FEL) and random fluence changes, results will be included in this README as they arrive.
Installing the Package
You can install via Pypi with pip:
pip install probixi
Or the latest development version with
pip install git+https://github.com/ryan-odea/probixi.git
Using probixi
probixi can be interacted with either via the command line interface, or through the python API. In it's current implementation, via python, the Probixi API returns iterables, which remain on a GPU tensor via pytorch up until collection - meaning that you can further pass information for any downstream processing. Through the CLI, this is currently a one-stop-shop for peakfinding and indexing. This may change in the future
probixi also has a 'burn-in' phase, where the noise model reaches some stable point, this can be further interrogated with a handy gif.
Via the CLI:
probixi -i files.lst -g myGeometry.geom -p myCell.cell -o stream.stream --device cuda --gif myNoiseModel.gif
Or with python:
import torch
from probixi import Probixi
from probixi.io import DataOffloader
pipeline = Probixi(
list_file="files.lst",
geometry_file="myGeometry.geom",
cell_file="myCell.cell",
device=torch.device("cuda"),
)
pipeline.noise_diagnostics("myNoiseModel.gif", stop=32)
cal = pipeline.calibrate(n_seed=1636)
print(f"kappa={cal.kappa:.2f} prior_peak={cal.prior_peak:.4f} "
f"threshold={pipeline.threshold_calibration.threshold:.2f}")
# Stream every frame through detect -> index -> predict + integrate. The stream
# is lazy and each result stays on the GPU until you touch it, so you can branch
# off any downstream processing with torch
with DataOffloader(
"stream.stream",
geometry=pipeline.geometry,
cell=pipeline.target_cell,
geometry_file="myGeometry.geom",
files=pipeline.metadata.files,
) as off:
for result in pipeline.index_stream(pipeline.frames(), batch_size=8):
off.write(result) # or: pipeline.index_stream(...).to_stream(off)
print(f"frame {result.frame_index}: "
f"{result.n_indexed}/{result.n_peaks} indexed (rmsd {result.rmsd:.4f})")
Comparison with other works
Here, we provide a comparison with other peakfinding algorithms with real data. Using a randomly sampled 10,000 frames from experimentally collected data.
Notes:
- For wall time, because
probixihandles optimizing internal hyperparameters automatically, I have included time used for loose manual hyperparameter tuning on 10% subsamples to find optimal SNR, threshold, and minimum pixels. CPU time for only peakfinding and indexing is bracketed. - Percent agreement is calculated as the (set of crystals indexed by probixi) / (set of crystals indexed by the reference) * 100. Greater than 100 indicates that
probixiwas able to index more crystals.
Benchmarks were run on:
- GPU: A100
- CPU: TODO which CPU do Ra nodes use?
peakfinder8 + indexamajig
| Dataset | Percent Indexed (probixi) |
GPU time (probixi) |
Percent Indexed (peakfinder8+indexamajig) |
CPU Time (peakfinder8+indexamajig) [No-Tuning] |
Percent Agreement |
|---|---|---|---|---|---|
| Lysozyme-Synchrotron | |||||
| Lysozyme-FEL | |||||
| BacterioRhodopsin-Synchrotron | |||||
| BacterioRhodopsin-FEL | |||||
| Randomly Dimmed Lysozyme-FEL | |||||
| Randomly Dimmed BacterioRhodopsin-FEL |
pyFAI + TORO
Perhaps a more fair comparison, especially with respect to speed, is pyFAI (azimuthal integration and peak picking) paired with the TORO indexer, which both run on the GPU.
| Dataset | Percent Indexed (probixi) |
GPU time (probixi) |
Percent Indexed (pyFAI+TORO) |
GPU Time (pyFAI+TORO) [No-Tuning] |
Percent Agreement |
|---|---|---|---|---|---|
| Lysozyme-Synchrotron | |||||
| Lysozyme-FEL | |||||
| BacterioRhodopsin-Synchrotron | |||||
| BacterioRhodopsin-FEL | |||||
| Randomly Dimmed Lysozyme-FEL | |||||
| Randomly Dimmed BacterioRhodopsin-FEL |
Using probixi as only a peakfinder
Of course, if you only want to use probixi as a peakfinder and prefer to use your own indexing regime, this is possible -- through the CLI's --peaks-only flag or the Python API's peak_stream.
Via the CLI:
probixi -i files.lst -g myGeometry.geom -o peaks.stream --peaks-only --device cuda
Or with python:
import torch
from probixi import Probixi
from probixi.io import PeakOffloader
pipeline = Probixi(
list_file="files.lst",
geometry_file="myGeometry.geom",
device=torch.device("cuda"),
)
# Calibrate the noise model + detection threshold on the seed frames, as usual.
pipeline.calibrate(n_seed=1636)
peaks = pipeline.peak_stream(pipeline.frames(), estimate_scale=False)
with PeakOffloader(
"peaks.stream",
geometry=pipeline.geometry,
geometry_file="myGeometry.geom",
files=pipeline.metadata.files,
) as off:
for result in peaks:
if len(result): # skip blanks; export only frames with peaks
off.write(result)
Dependencies
- python >= 3.9
- click
- h5py
- hdf5plugin
- numpy
- torch
- matplotlib
- pillow
Contributing
There are many different ways to contribute to further development of this tool. If you experience a bug or would like an additional feature, please open up a ticket.
If you would like to contribute actively by merging code, please open a PR with the following:
- Code is formatted with
isort, thenblack, followed by aruff --check. This will initiate on PR, so it might be best to check beforehand. - Docstrings are minimally on user-facing functions in
numpystyle. - Comments, or some explanation (in PR) for the additions, limited to the scope of the project. If fixing a bug, comments should be included in the PR rather than the code itself.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file probixi-0.1.0.tar.gz.
File metadata
- Download URL: probixi-0.1.0.tar.gz
- Upload date:
- Size: 94.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e6b3786eeacc3e6e3618838cb8de0c2d465015a02f21e50aeae99faa6519d0e
|
|
| MD5 |
ba0fe3ba5f2ff387ab7090f49c47d2ea
|
|
| BLAKE2b-256 |
71df6c15b6736d9555de8d9221170de6dd74d538aafe480bf4320bbfc98a98ef
|
Provenance
The following attestation bundles were made for probixi-0.1.0.tar.gz:
Publisher:
publish.yml on ryan-odea/probixi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
probixi-0.1.0.tar.gz -
Subject digest:
3e6b3786eeacc3e6e3618838cb8de0c2d465015a02f21e50aeae99faa6519d0e - Sigstore transparency entry: 1923450209
- Sigstore integration time:
-
Permalink:
ryan-odea/probixi@fa3141f2afe246e0d41de299be706f48af4398dc -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/ryan-odea
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fa3141f2afe246e0d41de299be706f48af4398dc -
Trigger Event:
release
-
Statement type:
File details
Details for the file probixi-0.1.0-py3-none-any.whl.
File metadata
- Download URL: probixi-0.1.0-py3-none-any.whl
- Upload date:
- Size: 80.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1ec882fcf20635f0a9a533ddeba3c161c976065dc5aec3c5f750daa329361bb
|
|
| MD5 |
8b8fac972858cf84367d6ec2bb0a0d9d
|
|
| BLAKE2b-256 |
1d166d79192925d969151dc1756405b29410ae58bbecf3687890b712b4e5197f
|
Provenance
The following attestation bundles were made for probixi-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on ryan-odea/probixi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
probixi-0.1.0-py3-none-any.whl -
Subject digest:
b1ec882fcf20635f0a9a533ddeba3c161c976065dc5aec3c5f750daa329361bb - Sigstore transparency entry: 1923450568
- Sigstore integration time:
-
Permalink:
ryan-odea/probixi@fa3141f2afe246e0d41de299be706f48af4398dc -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/ryan-odea
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fa3141f2afe246e0d41de299be706f48af4398dc -
Trigger Event:
release
-
Statement type: