Cryo-EM Pose-Assignment for Related Experiments via Supervision
Project description
CryoPARES: Cryo-EM Pose Assignment for Related Experiments via Supervised deep learning
CryoPARES is a software package for assigning poses to 2D cryo-electron microscopy (cryo-EM) particle images. It uses a supervised deep learning approach to accelerate 3D reconstruction in related cryo-EM experiments. The key idea is to train a neural network on a high-quality reference reconstruction, and then reuse this trained model to rapidly estimate particle poses in other, similar datasets.
This workflow is divided into two main phases:
-
Training: In this phase, you use a pre-existing, high-resolution dataset (where particle poses have already been determined by traditional methods like RELION refine) to train a cryoPARES model. This process creates a model that can recognize and assign poses to particles of that specific type of macromolecule.
-
Inference: Once the model is trained, you can use it for inference on new datasets of the same or very similar molecules (e.g., the same protein with a different ligand bound). Because the model has already learned the features of the molecule, it can predict particle poses almost instantly, bypassing the computationally expensive and time-consuming alignment steps of traditional workflows. This is especially powerful for applications like drug screening, where many similar datasets need to be processed quickly.
This "train once, infer many times" paradigm allows for near real-time 3D reconstruction, providing rapid feedback during data collection and analysis.
For a detailed explanation of the method, please refer to our paper: Supervised Deep Learning for Efficient Cryo-EM Image Alignment in Drug Discovery
Documentation: See the full documentation for detailed instructions on training, configuration, CLI reference, troubleshooting, and API reference.
Table of Contents
Installation
It is strongly recommended to use a virtual environment (e.g., conda) to avoid conflicts with other packages. CryoPARES has been tested on Ubuntu 20.04+ and Rocky Linux 8+ systems. NVIDIA Ampere or newer GPUs are recommended for running the code.
-
Create and activate a conda environment:
conda create -n cryopares python=3.12 conda activate cryopares
Option 1: Install from GitHub (Recommended for Users)
This is the simplest way to install cryoPARES.
pip install git+https://github.com/rsanchezgarc/cryoPARES.git
Option 2: Install from a Local Clone (Recommended for Developers)
This method is recommended if you want to modify the cryoPARES source code.
-
Clone the repository:
git clone https://github.com/rsanchezgarc/cryoPARES.git cd cryoPARES
-
Install the package in editable mode:
This allows you to make changes to the code without having to reinstall the package.
pip install -e .
Installation should take no more than a few minutes.
Usage
IMPORTANT: CryoPARES keeps a file handler open for each .mrcs file referenced in the .star file. This can lead
to a "Too many open files" error if the number of particle files is larger than the system's limit.
Before running training or inference, it is highly recommended to increase the open file limit by running
the following command in your terminal:
ulimit -n 65536 #You are now able to deal with more than 30K .mrcs files.
This command does not generally require sudo. If you are not allowed to increase this number, please, join the .mrcs from different micrographs together to reduce the number of required files.
CryoPARES has two main modes of operation: training and inference. Particles need to be provided as RELION 3.1+ starfile(s).
Training
The cryoPARES.train.train module is used to train a new model for pose estimation. Training needs to be done first using
a pre-aligned dataset of particles. While not mandatory, we encourage using particles alignments estimated with RELION.
Usage:
cryopares_train [ARGUMENTS] [--config [CONFIG_OVERRIDES]] [--show-config]
Key Arguments:
Required Parameters:
-
--symmetry: Point group symmetry of the molecule (e.g., C1, D7, I, O, T) -
--particles_star_fname: Path(s) to RELION 3.1+ format .star file(s) containing pre-aligned particles. Can accept multiple files -
--train_save_dir: Output directory where model checkpoints, logs, and training artifacts will be saved
Optional Parameters:
-
--image_size_px_for_nnet: Target image size in pixels for neural network input. After rescaling to target sampling rate, images are cropped or padded to this size. We recommend tight box-sizes -
--particles_dir: Root directory for particle image paths. If paths in .star file are relative, this directory is prepended (similar to RELION project directory concept) -
--n_epochs: Number of training epochs. More epochs allow better convergence, although it does not help beyond a certain point (Default:100) -
--batch_size: Number of particles per batch. Try to make it as large as possible before running out of GPU memory. We advice using batch sizes of at least 32 images (Default:32) -
--num_dataworkers: Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs (Default:8) -
--sampling_rate_angs_for_nnet: Target sampling rate in Angstroms/pixel for neural network input. Particle images are first rescaled to this sampling rate before processing (Default:1.5) -
--mask_radius_angs: Radius of circular mask in Angstroms applied to particle images. If not provided, defaults to half the box size -
--split_halves: If True (default), trains two separate models on data half-sets for cross-validation. Use --NOT_split_halves to train single model on all data (Default:True) -
--continue_checkpoint_dir: Path to checkpoint directory to resume training from a previous run -
--finetune_checkpoint_dir: Path to checkpoint directory to fine-tune a pre-trained model on new dataset -
--compile_model: Enable torch.compile for faster training (experimental) (Default:False) -
--val_check_interval: Fraction of epoch between validation checks. You generally don't want to touch it, but you can set it to smaller values (0.1-0.5) for large datasets to get quicker feedback -
--overfit_batches: Number of batches to use for overfitting test (debugging feature to verify model can memorize small dataset) -
--map_fname_for_simulated_pretraining: Path(s) to reference map(s) for simulated projection warmup before training on real data. The number of maps must match number of particle star files -
--junk_particles_star_fname: Optional star file(s) with junk-only particles for estimating confidence z-score thresholds -
--junk_particles_dir: Root directory for junk particle image paths (analogous to particles_dir)
Additional relevant Parameters (via --config):
You can override configuration parameters using --config KEY=VALUE. Multiple key-value pairs can be provided. The --config flag should be the last argument. To see all available configuration options, run cryopares_train --show-config.
train.learning_rate: Initial learning rate. (Default:1e-3). It needs to be tuned to get the best performance.train.weight_decay: Weight decay for optimizer, that regularizes the model. (Default:1e-5). Make it larger if you are suffer from overfitting.train.accumulate_grad_batches: Gradient accumulation batches to simulate larger batch sizes. (Default:16). The effecive batch size is batch_size * accumulate_grad_batches. We recommend to train with effective batches of size 512 < x < 2048.models.image2sphere.lmax: Maximum spherical harmonic degree. The larger, the more expresive the network is (Default:12). Reduce it if you see overfitting.datamanager.num_augmented_copies_per_batch: Number of augmented copies per particle. Each copy undergoes a different data augmentation. The batch_size needs to be selected to be divisible by this number. Large batches with large num_augmented_copies_per_batch values help stabilizing training, but require a lot of GPU memory (Default:4)
For comprehensive training guidance including monitoring with TensorBoard and avoiding overfitting/underfitting, see the Training Guide.
For a complete list of all configuration parameters, see the Configuration Guide.
Once the training is done, you could use the checkpoint dir contained in --train_save_dir to infer poses of new datasets.
The checkpoint dir is named version_0. If you run another training experiment with the same --train_save_dir, another
checkpoint dir names version_1 will be created.
Inference
The cryoPARES.inference.infer module is used to predict poses for a new set of particles using a trained model. It can be run in two modes: static and daemon.
Static Mode
In static mode, the inference is run on a fixed set of particles, that again, need to be provided as RELION 3.1+ starfiles.
Usage:
cryopares_infer [ARGUMENTS] [--config [CONFIG_OVERRIDES]] [--show-config]
Key Arguments:
Required Parameters:
-
--particles_star_fname: Path to input RELION particles .star file -
--checkpoint_dir: Path to training directory (or .zip file) containing half-set models with checkpoints and hyperparameters. By default they are called version_0, version_1, etc. -
--results_dir: Output directory for inference results including predicted poses and optional reconstructions
Optional Parameters:
-
--data_halfset: Which particle half-set(s) to process: "half1", "half2", or "allParticles" (Default:allParticles) -
--model_halfset: Model half-set selection policy: "half1", "half2", "allCombinations", or "matchingHalf" (uses matching data/model pairs) (Default:matchingHalf) -
--particles_dir: Root directory for particle image paths. If provided, overrides paths in the .star file -
--batch_size: Number of particles per batch for inference (Default:32) -
--n_jobs: Number of worker processes. Defaults to number of GPUs if CUDA enabled, otherwise 1 -
--num_dataworkers: Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs (Default:8) -
--use_cuda: Enable GPU acceleration for inference. If False, runs on CPU only (Default:True) -
--n_cpus_if_no_cuda: Maximum CPU threads per worker when CUDA is disabled (Default:4) -
--compile_model: Compile model with torch.compile for faster inference (experimental, requires PyTorch 2.0+) (Default:False) -
--top_k_poses_nnet: Number of top pose predictions to retrieve from neural network before local refinement (Default:1) -
--top_k_poses_localref: Number of best matching poses to keep after local refinement (Default:1) -
--grid_distance_degs: Maximum angular distance in degrees for local refinement search. Grid ranges from -grid_distance_degs to +grid_distance_degs around predicted pose (Default:4.0) -
--reference_map: Path to reference map (.mrc) for FSC computation during validation -
--reference_mask: Path to reference mask (.mrc) for masked FSC calculation -
--directional_zscore_thr: Confidence z-score threshold for filtering particles. Particles with scores below this are discarded as low-confidence -
--skip_localrefinement: Skip local pose refinement step and use only neural network predictions (Default:False) -
--skip_reconstruction: Skip 3D reconstruction step and output only predicted poses (Default:False) -
--subset_idxs: List of particle indices to process (for debugging or partial processing) -
--n_first_particles: Process only the first N particles from dataset (debug feature) -
--check_interval_secs: Polling interval in seconds for parent loop in distributed processing (Default:2.0) -
--merge_halves_output: No description available (Default:False)
Half-Set Selection (--data_halfset and --model_halfset)
To avoid overfitting and to ensure a fair evaluation, cryo-EM datasets are often split into two halves (half1 and half2). CryoPARES uses this concept for both the data and the model.
-
--data_halfset: Specifies which half of the data to use for inference.half1: Use only the particles from the first half of the dataset.half2: Use only the particles from the second half of the dataset.allParticles: Use all particles from the dataset. (Default)
-
--model_halfset: Specifies which trained model to use for inference. During training, CryoPARES creates two models, one for each half of the training data.half1: Use the model trained on the first half of the training data.half2: Use the model trained on the second half of the training data.matchingHalf: Use the model from the corresponding half of the data (e.g.,half1data withhalf1model). This is the default and recommended setting.allCombinations: Run inference for all possible combinations of data and model halves (e.g.,half1data withhalf1model,half1data withhalf2model, etc.).
Note: Many of these parameters can also be set via --config (e.g., --config projmatching.grid_step_degs=2.0). However, using the direct CLI flags is recommended for commonly adjusted parameters.
To see all available configuration options, run cryopares_infer --show-config.
For detailed API documentation, see the API Reference.
Daemon Mode (On-the-fly)
In daemon mode, the inference script runs continuously and watches for new particles to be added to a directory. This is useful for processing particles as they are being generated.
The daemon workflow consists of three main components:
- Queue Manager: A central server that hosts one or more named queues on a single port.
- Spooling Filler: A script that monitors a directory for new
.starfiles and adds them to a queue. You could implement other filler protocols using this module as an example. - Daemon Inferencer: One or more worker processes that consume jobs from a queue and perform inference.
All three components communicate over the network. The Spooling Filler and Daemon Inferencer must be
configured with ip/port/authkey/queue_name values that match the Queue Manager.
The Queue Manager does not take a queue_name argument — it hosts all named queues, creating them
on demand when a client first requests a given name.
Workflow:
-
Start the Queue Manager:
This script creates the central queue server. It should be run once and kept running in the background. A single server instance can host multiple independent named queues on the same port — useful when running several independent inference pipelines simultaneously.
python -m cryoPARES.inference.daemon.queueManager [--ip IP] [--port PORT] [--authkey KEY] [--queue_maxsize N]
Argument Default Description --iplocalhostIP address to bind to. Use 0.0.0.0to accept remote connections.--port50000TCP port to listen on. --authkeyshared_queue_keyAuthentication passphrase shared by all clients. --queue_maxsizeunlimited Max pending jobs per queue ( None= no limit).# Default settings python -m cryoPARES.inference.daemon.queueManager # Remote server, custom port/key, bounded queues python -m cryoPARES.inference.daemon.queueManager \ --ip 0.0.0.0 --port 51000 --authkey mysecret --queue_maxsize 100
-
Start the Spooling Filler:
This script watches a directory for new
.starfiles and adds them to a named queue.python -m cryoPARES.inference.daemon.spoolingFiller --directory DIR \ [--ip IP] [--port PORT] [--authkey KEY] [--queue_name NAME] \ [--pattern GLOB] [--check_interval SECS]
Argument Default Description --directory(required) Directory to monitor for new .starfiles.--iplocalhostIP address of the queue manager server. --port50000Port of the queue manager server. --authkeyshared_queue_keyAuthentication key (must match the server). --queue_namedefaultName of the queue to submit jobs to. --pattern*.starGlob pattern for files to watch. --check_interval10Seconds between directory scans. # Default queue, local server python -m cryoPARES.inference.daemon.spoolingFiller --directory /path/to/watch # Named queue on a remote server python -m cryoPARES.inference.daemon.spoolingFiller \ --directory /path/to/watch \ --ip 192.168.1.10 --port 51000 --authkey mysecret \ --queue_name my_pipeline
Alternative: Manually Submit Jobs
You can also submit jobs programmatically using queue_connection.
from cryoPARES.inference.daemon.queueManager import queue_connection
# Submit a single .star file (default queue, default ip/port/authkey)
with queue_connection(ip="localhost", port=50000, authkey="shared_queue_key") as queue:
queue.put("/path/to/particles.star")
# Submit to a named queue on a remote server
with queue_connection(ip="192.168.1.10", port=51000, authkey="mysecret",
queue_name="my_pipeline") as queue:
for star_file in ["/path/to/particles1.star", "/path/to/particles2.star"]:
queue.put(star_file)
# Submit particles already loaded in memory (no disk I/O on the worker side)
import starfile
star = starfile.read("/path/to/particles.star")
# Optionally filter star["particles"] here before submitting
with queue_connection() as queue:
queue.put((star["optics"], star["particles"])) # (optics_df, particles_df)
# Send poison pill to terminate workers of a specific queue gracefully
with queue_connection(queue_name="my_pipeline") as queue:
queue.put(None)
Input formats accepted by the Daemon Inferencer:
- String: Path to a
.starfile - Tuple
(optics_df, particles_df): pandas DataFrames already loaded or filtered in memory; no disk I/O required by the worker None: Poison pill — signals workers to terminate gracefully
-
Start the Daemon Inferencer(s):
You can start as many inference workers as you want. Each worker will take jobs from the queue and process them. Important: each worker must have its own
--results_dir. The inference arguments are the same as forcryopares_infer, plus the network arguments below.python -m cryoPARES.inference.daemon.daemonInference \ --checkpoint_dir DIR --results_dir DIR \ [--net_address IP] [--net_port PORT] [--net_authkey KEY] [--net_queue_name NAME] \ [inference options…]
Argument Default Description --net_addresslocalhostIP address of the queue manager server. --net_port50000Port of the queue manager server. --net_authkeyshared_queue_keyAuthentication key (must match the server). --net_queue_namedefaultName of the queue to consume jobs from. # Two workers on the default queue python -m cryoPARES.inference.daemon.daemonInference \ --checkpoint_dir /path/to/checkpoint --results_dir /path/to/results_worker1 \ --particles_dir /path/to/particles python -m cryoPARES.inference.daemon.daemonInference \ --checkpoint_dir /path/to/checkpoint --results_dir /path/to/results_worker2 \ --particles_dir /path/to/particles # Worker on a named queue from a remote server python -m cryoPARES.inference.daemon.daemonInference \ --checkpoint_dir /path/to/checkpoint --results_dir /path/to/results_pipe2 \ --net_address 192.168.1.10 --net_port 51000 --net_authkey mysecret \ --net_queue_name my_pipeline
-
Materialize the Volume:
You can materialize the final 3D volume from the partial results at any time, even while the inference workers are still running. The script will combine all the available partial results.
python -m cryoPARES.inference.daemon.materializePartialResults \ --partial_outputs_dirs /path/to/results_worker1/ /path/to/results_worker2 \ --output_mrc /path/to/final_map.mrc --output_star /path/to/final_particles.star
Utility Tools
CryoPARES includes standalone utility tools for projection matching and reconstruction.
Note: These tools are automatically used within the cryopares_infer workflow, but can also be run independently if needed.
Projection Matching
The projection matching utility performs local pose refinement by searching around existing particle orientations to find the best match against reference volume projections. This is used automatically during inference for local refinement, but can also be run standalone.
Usage:
cryopares_projmatching [ARGUMENTS] [--config [CONFIG_OVERRIDES]] [--show-config]
Key Arguments:
Required Parameters:
-
--reference_vol: Path to reference 3D volume (.mrc file) for generating projection templates -
--particles_star_fname: Path to input STAR file with particle metadata -
--out_fname: Path for output STAR file with aligned particle poses -
--particles_dir: Root directory for particle image paths. If provided, overrides paths in the .star file
Optional Parameters:
-
--mask_radius_angs: Radius of circular mask in Angstroms applied to particle images -
--grid_distance_degs: Maximum angular distance in degrees for local refinement search. Grid ranges from -grid_distance_degs to +grid_distance_degs around predicted pose (Default:4.0) -
--grid_step_degs: Angular step size in degrees for grid search during local refinement (Default:2.0) -
--return_top_k_poses: Number of top matching poses to save per particle (Default:1) -
--filter_resolution_angst: Low-pass filter resolution in Angstroms applied to reference volume before matching -
--n_jobs: Number of parallel worker processes for distributed projection matching (Default:1) -
--num_dataworkers: Number of CPU workers per PyTorch DataLoader for data loading (Default:1) -
--batch_size: Number of particles to process simultaneously per job (Default:32) -
--use_cuda: Enable GPU acceleration. If False, runs on CPU only (Default:True) -
--verbose: Enable verbose logging output (Default:False) -
--float32_matmul_precision: PyTorch float32 matrix multiplication precision mode ("highest", "high", or "medium") (Default:high) -
--gpu_id: Specific GPU device ID to use (if multiple GPUs available) -
--n_first_particles: Process only the first N particles from dataset (for testing or validation) -
--correct_ctf: Apply CTF correction during projection matching (Default:True) -
--halfmap_subset: Select half-map subset (1 or 2) for half-map validation
For additional details, see the Command-Line Interface documentation.
Post-processing
The post-processing utility sharpens reconstructed volumes using B-factor estimation (Guinier analysis) and FSC weighting. Run it after reconstruction to improve map interpretability.
Usage:
cryopares_postprocess bfactor \
--half1 /path/to/half1.mrc \
--half2 /path/to/half2.mrc \
--mask /path/to/mask.mrc \ # or --auto_mask
--output_dir /path/to/postprocess_output
For all options, see the CLI Reference.
Reconstruction
The reconstruction utility creates a 3D volume from particles with known poses using direct Fourier inversion. This is used automatically during inference to generate the final 3D map, but can also be run standalone for particles aligned by other methods (e.g., RELION).
Usage:
cryopares_reconstruct [--config [CONFIG_OVERRIDES]] [--show-config]
Key Arguments:
Required Parameters:
-
--particles_star_fname: Path to input STAR file with particle metadata and poses to reconstruct -
--symmetry: Point group symmetry of the volume for reconstruction (e.g., C1, D2, I, O, T) -
--output_fname: Path for output reconstructed 3D volume (.mrc file)
Optional Parameters:
-
--particles_dir: Root directory for particle image paths. If provided, overrides paths in the .star file -
--n_jobs: Number of parallel worker processes for distributed reconstruction (Default:1) -
--num_dataworkers: Number of CPU workers per PyTorch DataLoader for data loading (Default:1) -
--batch_size: Number of particles to backproject simultaneously per job (Default:128) -
--use_cuda: Enable GPU acceleration for reconstruction. If False, runs on CPU only (Default:True) -
--correct_ctf: Apply CTF correction during reconstruction (Default:True) -
--eps: Regularization mode and strength. Sign selects mode: eps >= 0 uses Tikhonov regularization, eps < 0 uses RELION-style radial averaging. Magnitude sets scale: for Tikhonov, eps is the regularization constant (ideally 1/SNR); for radial averaging, abs(eps) is the divisor for radial weights (RELION uses 1000). Recommended: -1000 for radial averaging, 1e-3 for Tikhonov (Default:-1000.0) -
--min_denominator_value: Minimum denominator threshold for numerical stability (prevents division by zero). Applied as final safety clamp regardless of regularization mode. RELION uses 1e-6 (Default:1e-06) -
--use_only_n_first_batches: Reconstruct using only first N batches (for testing or quick validation) -
--float32_matmul_precision: PyTorch float32 matrix multiplication precision mode ("highest", "high", or "medium") (Default:high) -
--weight_with_confidence: Apply per-particle confidence weighting during backprojection. If True, particles with higher confidence contribute more to reconstruction. It reads the confidence from the metadata label "rlnParticleFigureOfMerit" (Default:False) -
--halfmap_subset: Select half-map subset (1 or 2) for half-map reconstruction and validation -
--apply_soft_mask: Apply soft spherical masking after reconstruction to reduce edge artifacts (RELION-style) (Default:True) -
--mask_radius_pix: Radius for soft mask in pixels. If negative, defaults to box_size/2 (Default:-1.0) -
--mask_edge_width: Width of cosine falloff edge in pixels (Default:3)
For additional details, see the Command-Line Interface documentation.
Checkpoint Compactification
After training, you can package your checkpoint into a compact ZIP file for easy distribution and storage. This reduces the checkpoint size from ~40 GB to ~10 GB by removing training logs, metrics, and intermediate files while keeping everything needed for inference.
Compactify a checkpoint:
python -m cryoPARES.scripts.compactify_checkpoint \
--checkpoint_dir /path/to/training_output/version_0
This creates version_0_compact.zip containing only the essential files.
Use the compactified checkpoint for inference:
cryopares_infer \
--particles_star_fname /path/to/particles.star \
--checkpoint_dir /path/to/version_0_compact.zip \
--results_dir /path/to/results
The ZIP file is used directly without extraction, making it ideal for:
- Sharing models with collaborators
- Archiving trained models efficiently
- Deploying to inference servers with limited storage
Documentation
- Training Guide - Comprehensive guide on training models, monitoring with TensorBoard, and avoiding overfitting/underfitting
- API Reference - Auto-generated API documentation with type hints (hosted on GitHub Pages)
- Configuration Guide - Complete reference for all configuration parameters
- Troubleshooting Guide - Solutions to common issues
- CLI Reference - Command-line interface documentation
Building Documentation Locally:
cd docs
pip install -r requirements.txt
make html
# Open _build/html/index.html in your browser
Configuration System
CryoPARES uses a flexible configuration system that allows you to manage settings from multiple sources.
-
--show-config: To see all available options, run any main script with the--show-configflag. This will print a comprehensive list of all parameters, their current values, and their paths.cryopares_train --show-config -
YAML Files: Create a
.yamlfile with your desired parameters. -
Command-Line Overrides: Pass
KEY=VALUEpairs to the program. Use dot notation to specify nested parameters (e.g.,models.image2sphere.lmax=6). -
Direct Arguments: Use standard command-line flags (e.g.,
--batch_size 32).
Precedence: Direct command-line arguments override --config overrides, which override YAML files, which override the default configuration.
For a complete reference of all configuration parameters, see the Configuration Guide.
Example Workflow
Quick Start with Test Dataset
Before running on your own data, we recommend testing cryoPARES with a small dataset. If you don't have a small particles .star file, you can download some examples from CESPED (Cryo-EM Supervised Pose Estimation Dataset). CESPED provides benchmark datasets specifically designed for supervised pose estimation.
Install CESPED (Optional)
pip install cesped
Download a Test Dataset
For a quick test, use the small TEST dataset (subset of EMPIAR-11120):
python -m cesped.particlesDataset download_entry -t TEST --benchmarkDir /path/to/your/data
Please, notice that you won't be able to train an accurate model using this small dataset, but it will be good to check that you can run the full workflow
For a full benchmark dataset, you can download other CEPSPED entries such as the EMPIAR-10166 (Human 26S proteasome, C1 symmetry, 238K particles):
# Download both half-sets
python -m cesped.particlesDataset download_entry -t 10166 --benchmarkDir /path/to/your/data
Training and Inference Example
Once you have downloaded a CESPED dataset, you can train and test cryoPARES:
- Train a model on an existing, aligned dataset:
cryopares_train \
--symmetry C1 \
--particles_star_fname /path/to/your/data/CESPED/TEST/particles_merged.star \
--particles_dir /path/to/cesped_benchmark/TEST/ \
--train_save_dir /path/to/training_output \
--n_epochs 3 \
--batch_size 32 \
--sampling_rate_angs_for_nnet 1.5 \
--image_size_px_for_nnet 64 \
--config models.image2sphere.lmax=6 models.image2sphere.so3components.so3outputgrid.hp_order=3 models.image2sphere.so3components.i2sprojector.sphere_fdim=64 models.image2sphere.so3components.s2conv.f_out=16 models.image2sphere.imageencoder.unet.out_channels_first=4
Notice that we have added several --config flags to create a small model, that will not perform well, but it will be quick.
We are also using a --image_size_px_for_nnet much smaller than advisable (we recomend 128 to 256, depending on the particle)
For production use:
cryopares_train \
--symmetry C1 \
--particles_star_fname /path/to/particles.star \
--particles_dir /path/to/particles/ \
--train_save_dir /path/to/training_output \
--n_epochs 100 \
--batch_size 32 \
--image_size_px_for_nnet 160 \
--sampling_rate_angs_for_nnet 1.5 #We are using the default model, hence no --config
You can tweak the neural network setting different values with the --config flag. Use --show-config to get the list of all available options.
After training, there should be a directory called /path/to/training_output/version_* with our checkpoint. We need to provide such a directory to the inference command.
-
Run inference on a new dataset with local refinement and reconstruction:
cryopares_infer \ --particles_star_fname /path/to/new_particles.star \ --particles_dir /path/to/particles \ --checkpoint_dir /path/to/training_output/version_0 \ --results_dir /path/to/inference_results \ --reference_map /path/to/initial_model.mrc \ #If not provided, it is automatically generated from the training data --batch_size 32 \ --grid_distance_degs 12 \ #Local search will be from -12º to +12º --directional_zscore_thr 1.0 # Remove all particles with directional zscore <1.0
Getting Help
If you encounter issues:
- Check the Troubleshooting Guide for common problems and solutions
- Review the Training Guide for training best practices
- Consult the Configuration Guide for parameter details
- See the API Reference for programmatic usage
For bugs or feature requests, please open an issue on GitHub.
License and Attribution
CryoPARES is licensed under the GNU General Public License v3.0 (GPL-3.0). See the LICENSE file for details.
Third-Party Code
This project incorporates code derived from the following open-source projects:
- torch-fourier-slice (Copyright © 2023 Alister Burt, BSD 3-Clause License)
- Used in:
cryoPARES/reconstruction/insert_central_slices_rfft_3d.py - Used in:
cryoPARES/projmatching/projmatchingUtils/extract_central_slices_as_real.py
- Used in:
See THIRD-PARTY-LICENSES for complete license texts and attribution details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cryopares-0.1.0.tar.gz.
File metadata
- Download URL: cryopares-0.1.0.tar.gz
- Upload date:
- Size: 369.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac928c722daa01b4bf5aa91b96a6b689ca0f0effb66bb6d5ea01e640e27c0b69
|
|
| MD5 |
3c14f9c1e0d84f4c0d6416f44ab2dec7
|
|
| BLAKE2b-256 |
d0d45e07552e2676ae75f41a33bf65df0bfa6a104390869eb8888d79be567ca3
|
Provenance
The following attestation bundles were made for cryopares-0.1.0.tar.gz:
Publisher:
publish.yml on rsanchezgarc/cryoPARES
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cryopares-0.1.0.tar.gz -
Subject digest:
ac928c722daa01b4bf5aa91b96a6b689ca0f0effb66bb6d5ea01e640e27c0b69 - Sigstore transparency entry: 1726033911
- Sigstore integration time:
-
Permalink:
rsanchezgarc/cryoPARES@821b5932af7f9972c29e30a00cbe032d2aed9220 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/rsanchezgarc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@821b5932af7f9972c29e30a00cbe032d2aed9220 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cryopares-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cryopares-0.1.0-py3-none-any.whl
- Upload date:
- Size: 352.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84b95db47053f0215863f73239b2d453457d886c97c4a112f36b147ee82e97bf
|
|
| MD5 |
76ecdb0fa1bb4357000a7a06b7ab1bbb
|
|
| BLAKE2b-256 |
a6e5f0548e584576cfdd14ca8a938a295102951c24d79942b2561830d676d000
|
Provenance
The following attestation bundles were made for cryopares-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on rsanchezgarc/cryoPARES
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cryopares-0.1.0-py3-none-any.whl -
Subject digest:
84b95db47053f0215863f73239b2d453457d886c97c4a112f36b147ee82e97bf - Sigstore transparency entry: 1726034537
- Sigstore integration time:
-
Permalink:
rsanchezgarc/cryoPARES@821b5932af7f9972c29e30a00cbe032d2aed9220 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/rsanchezgarc
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@821b5932af7f9972c29e30a00cbe032d2aed9220 -
Trigger Event:
push
-
Statement type: