Cryo-EM Pose-Assignment for Related Experiments via Supervision

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

CryoPARES: Cryo-EM Pose Assignment for Related Experiments via Supervised deep learning

CryoPARES is a software package for assigning poses to 2D cryo-electron microscopy (cryo-EM) particle images. It uses a supervised deep learning approach to accelerate 3D reconstruction in related cryo-EM experiments. The key idea is to train a neural network on a high-quality reference reconstruction, and then reuse this trained model to rapidly estimate particle poses in other, similar datasets.

This workflow is divided into two main phases:

Training: In this phase, you use a pre-existing, high-resolution dataset (where particle poses have already been determined by traditional methods like RELION refine) to train a cryoPARES model. This process creates a model that can recognize and assign poses to particles of that specific type of macromolecule.
Inference: Once the model is trained, you can use it for inference on new datasets of the same or very similar molecules (e.g., the same protein with a different ligand bound). Because the model has already learned the features of the molecule, it can predict particle poses almost instantly, bypassing the computationally expensive and time-consuming alignment steps of traditional workflows. This is especially powerful for applications like drug screening, where many similar datasets need to be processed quickly.

This "train once, infer many times" paradigm allows for near real-time 3D reconstruction, providing rapid feedback during data collection and analysis.

For a detailed explanation of the method, please refer to our paper: Supervised Deep Learning for Efficient Cryo-EM Image Alignment in Drug Discovery

Documentation: See the full documentation for detailed instructions on training, configuration, CLI reference, troubleshooting, and API reference.

Installation
Usage
Documentation
Example Workflow
Getting Help
License and Attribution

Installation

It is strongly recommended to use a virtual environment (e.g., conda) to avoid conflicts with other packages. CryoPARES has been tested on Ubuntu 20.04+ and Rocky Linux 8+ systems. NVIDIA Ampere or newer GPUs are recommended for running the code.

Create and activate a conda environment:

conda create -n cryopares python=3.12
conda activate cryopares

Option 1: Install from GitHub (Recommended for Users)

This is the simplest way to install cryoPARES.

pip install git+https://github.com/rsanchezgarc/cryoPARES.git

Option 2: Install from a Local Clone (Recommended for Developers)

This method is recommended if you want to modify the cryoPARES source code.

Clone the repository:

git clone https://github.com/rsanchezgarc/cryoPARES.git
cd cryoPARES

Install the package in editable mode:

This allows you to make changes to the code without having to reinstall the package.
```
pip install -e .
```

Installation should take no more than a few minutes.

Usage

IMPORTANT: CryoPARES keeps a file handler open for each .mrcs file referenced in the .star file. This can lead to a "Too many open files" error if the number of particle files is larger than the system's limit. Before running training or inference, it is highly recommended to increase the open file limit by running the following command in your terminal:

ulimit -n 65536 #You are now able to deal with more than 30K .mrcs files.

This command does not generally require sudo. If you are not allowed to increase this number, please, join the .mrcs from different micrographs together to reduce the number of required files.

CryoPARES has two main modes of operation: training and inference. Particles need to be provided as RELION 3.1+ starfile(s).

Training

The cryoPARES.train.train module is used to train a new model for pose estimation. Training needs to be done first using a pre-aligned dataset of particles. While not mandatory, we encourage using particles alignments estimated with RELION.

Usage:

cryopares_train [ARGUMENTS] [--config [CONFIG_OVERRIDES]] [--show-config]

Key Arguments:

Required Parameters:

--symmetry: Point group symmetry of the molecule (e.g., C1, D7, I, O, T)
--particles_star_fname: Path(s) to RELION 3.1+ format .star file(s) containing pre-aligned particles. Can accept multiple files
--train_save_dir: Output directory where model checkpoints, logs, and training artifacts will be saved

Optional Parameters:

--image_size_px_for_nnet: Target image size in pixels for neural network input. After rescaling to target sampling rate, images are cropped or padded to this size. We recommend tight box-sizes
--particles_dir: Root directory for particle image paths. If paths in .star file are relative, this directory is prepended (similar to RELION project directory concept)
--n_epochs: Number of training epochs. More epochs allow better convergence, although it does not help beyond a certain point (Default: 100)
--batch_size: Number of particles per batch. Try to make it as large as possible before running out of GPU memory. We advice using batch sizes of at least 32 images (Default: 32)
--num_dataworkers: Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs (Default: 8)
--sampling_rate_angs_for_nnet: Target sampling rate in Angstroms/pixel for neural network input. Particle images are first rescaled to this sampling rate before processing (Default: 1.5)
--mask_radius_angs: Radius of circular mask in Angstroms applied to particle images. If not provided, defaults to half the box size
--split_halves: If True (default), trains two separate models on data half-sets for cross-validation. Use --NOT_split_halves to train single model on all data (Default: True)
--continue_checkpoint_dir: Path to checkpoint directory to resume training from a previous run
--finetune_checkpoint_dir: Path to checkpoint directory to fine-tune a pre-trained model on new dataset
--compile_model: Enable torch.compile for faster training (experimental) (Default: False)
--val_check_interval: Fraction of epoch between validation checks. You generally don't want to touch it, but you can set it to smaller values (0.1-0.5) for large datasets to get quicker feedback
--overfit_batches: Number of batches to use for overfitting test (debugging feature to verify model can memorize small dataset)
--map_fname_for_simulated_pretraining: Path(s) to reference map(s) for simulated projection warmup before training on real data. The number of maps must match number of particle star files
--junk_particles_star_fname: Optional star file(s) with junk-only particles for estimating confidence z-score thresholds
--junk_particles_dir: Root directory for junk particle image paths (analogous to particles_dir)

Additional relevant Parameters (via --config):

You can override configuration parameters using --config KEY=VALUE. Multiple key-value pairs can be provided. The --config flag should be the last argument. To see all available configuration options, run cryopares_train --show-config.

train.learning_rate: Initial learning rate. (Default: 1e-3). It needs to be tuned to get the best performance.
train.weight_decay: Weight decay for optimizer, that regularizes the model. (Default: 1e-5). Make it larger if you are suffer from overfitting.
train.accumulate_grad_batches: Gradient accumulation batches to simulate larger batch sizes. (Default: 16). The effecive batch size is batch_size * accumulate_grad_batches. We recommend to train with effective batches of size 512 < x < 2048.
models.image2sphere.lmax: Maximum spherical harmonic degree. The larger, the more expresive the network is (Default: 12). Reduce it if you see overfitting.
datamanager.num_augmented_copies_per_batch: Number of augmented copies per particle. Each copy undergoes a different data augmentation. The batch_size needs to be selected to be divisible by this number. Large batches with large num_augmented_copies_per_batch values help stabilizing training, but require a lot of GPU memory (Default: 4)

For comprehensive training guidance including monitoring with TensorBoard and avoiding overfitting/underfitting, see the Training Guide. For a complete list of all configuration parameters, see the Configuration Guide.
Once the training is done, you could use the checkpoint dir contained in --train_save_dir to infer poses of new datasets. The checkpoint dir is named version_0. If you run another training experiment with the same --train_save_dir, another checkpoint dir names version_1 will be created.

Inference

The cryoPARES.inference.infer module is used to predict poses for a new set of particles using a trained model. It can be run in two modes: static and daemon.

Static Mode

In static mode, the inference is run on a fixed set of particles, that again, need to be provided as RELION 3.1+ starfiles.

Usage:

cryopares_infer [ARGUMENTS] [--config [CONFIG_OVERRIDES]] [--show-config]

Key Arguments:

Required Parameters:

--particles_star_fname: Path to input RELION particles .star file
--checkpoint_dir: Path to training directory (or .zip file) containing half-set models with checkpoints and hyperparameters. By default they are called version_0, version_1, etc.
--results_dir: Output directory for inference results including predicted poses and optional reconstructions

Optional Parameters:

--data_halfset: Which particle half-set(s) to process: "half1", "half2", or "allParticles" (Default: allParticles)
--model_halfset: Model half-set selection policy: "half1", "half2", "allCombinations", or "matchingHalf" (uses matching data/model pairs) (Default: matchingHalf)
--particles_dir: Root directory for particle image paths. If provided, overrides paths in the .star file
--batch_size: Number of particles per batch for inference (Default: 32)
--n_jobs: Number of worker processes. Defaults to number of GPUs if CUDA enabled, otherwise 1
--num_dataworkers: Number of parallel data loading workers per GPU. Each worker is a separate CPU process. Set to 0 to load data in the main thread (useful only for debugging). Try not to oversubscribe by asking more workers than CPUs (Default: 8)
--use_cuda: Enable GPU acceleration for inference. If False, runs on CPU only (Default: True)
--n_cpus_if_no_cuda: Maximum CPU threads per worker when CUDA is disabled (Default: 4)
--compile_model: Compile model with torch.compile for faster inference (experimental, requires PyTorch 2.0+) (Default: False)
--top_k_poses_nnet: Number of top pose predictions to retrieve from neural network before local refinement (Default: 1)
--top_k_poses_localref: Number of best matching poses to keep after local refinement (Default: 1)
--grid_distance_degs: Maximum angular distance in degrees for local refinement search. Grid ranges from -grid_distance_degs to +grid_distance_degs around predicted pose (Default: 4.0)
--reference_map: Path to reference map (.mrc) for FSC computation during validation
--reference_mask: Path to reference mask (.mrc) for masked FSC calculation
--directional_zscore_thr: Confidence z-score threshold for filtering particles. Particles with scores below this are discarded as low-confidence
--skip_localrefinement: Skip local pose refinement step and use only neural network predictions (Default: False)
--skip_reconstruction: Skip 3D reconstruction step and output only predicted poses (Default: False)
--subset_idxs: List of particle indices to process (for debugging or partial processing)
--n_first_particles: Process only the first N particles from dataset (debug feature)
--check_interval_secs: Polling interval in seconds for parent loop in distributed processing (Default: 2.0)
--merge_halves_output: No description available (Default: False)

Half-Set Selection (--data_halfset and --model_halfset)

To avoid overfitting and to ensure a fair evaluation, cryo-EM datasets are often split into two halves (half1 and half2). CryoPARES uses this concept for both the data and the model.

--data_halfset: Specifies which half of the data to use for inference.
- half1: Use only the particles from the first half of the dataset.
- half2: Use only the particles from the second half of the dataset.
- allParticles: Use all particles from the dataset. (Default)
--model_halfset: Specifies which trained model to use for inference. During training, CryoPARES creates two models, one for each half of the training data.
- half1: Use the model trained on the first half of the training data.
- half2: Use the model trained on the second half of the training data.
- matchingHalf: Use the model from the corresponding half of the data (e.g., half1 data with half1 model). This is the default and recommended setting.
- allCombinations: Run inference for all possible combinations of data and model halves (e.g., half1 data with half1 model, half1 data with half2 model, etc.).

Note: Many of these parameters can also be set via --config (e.g., --config projmatching.grid_step_degs=2.0). However, using the direct CLI flags is recommended for commonly adjusted parameters. To see all available configuration options, run cryopares_infer --show-config.

For detailed API documentation, see the API Reference.

Daemon Mode (On-the-fly)

In daemon mode, the inference script runs continuously and watches for new particles to be added to a directory. This is useful for processing particles as they are being generated.

The daemon workflow consists of three main components:

Queue Manager: A central server that hosts one or more named queues on a single port.
Spooling Filler: A script that monitors a directory for new .star files and adds them to a queue. You could implement other filler protocols using this module as an example.
Daemon Inferencer: One or more worker processes that consume jobs from a queue and perform inference.

All three components communicate over the network. The Spooling Filler and Daemon Inferencer must be configured with ip/port/authkey/queue_name values that match the Queue Manager. The Queue Manager does not take a queue_name argument — it hosts all named queues, creating them on demand when a client first requests a given name.

Workflow:

Start the Queue Manager:

This script creates the central queue server. It should be run once and kept running in the background. A single server instance can host multiple independent named queues on the same port — useful when running several independent inference pipelines simultaneously.

python -m cryoPARES.inference.daemon.queueManager [--ip IP] [--port PORT] [--authkey KEY] [--queue_maxsize N]

Argument	Default	Description
`--ip`	`localhost`	IP address to bind to. Use `0.0.0.0` to accept remote connections.
`--port`	`50000`	TCP port to listen on.
`--authkey`	`shared_queue_key`	Authentication passphrase shared by all clients.
`--queue_maxsize`	unlimited	Max pending jobs per queue (`None` = no limit).

# Default settings
python -m cryoPARES.inference.daemon.queueManager

# Remote server, custom port/key, bounded queues
python -m cryoPARES.inference.daemon.queueManager \
    --ip 0.0.0.0 --port 51000 --authkey mysecret --queue_maxsize 100

Start the Spooling Filler:

This script watches a directory for new .star files and adds them to a named queue.

python -m cryoPARES.inference.daemon.spoolingFiller --directory DIR \
    [--ip IP] [--port PORT] [--authkey KEY] [--queue_name NAME] \
    [--pattern GLOB] [--check_interval SECS]

Argument	Default	Description
`--directory`	(required)	Directory to monitor for new `.star` files.
`--ip`	`localhost`	IP address of the queue manager server.
`--port`	`50000`	Port of the queue manager server.
`--authkey`	`shared_queue_key`	Authentication key (must match the server).
`--queue_name`	`default`	Name of the queue to submit jobs to.
`--pattern`	`*.star`	Glob pattern for files to watch.
`--check_interval`	`10`	Seconds between directory scans.

# Default queue, local server
python -m cryoPARES.inference.daemon.spoolingFiller --directory /path/to/watch

# Named queue on a remote server
python -m cryoPARES.inference.daemon.spoolingFiller \
    --directory /path/to/watch \
    --ip 192.168.1.10 --port 51000 --authkey mysecret \
    --queue_name my_pipeline

Alternative: Manually Submit Jobs

You can also submit jobs programmatically using queue_connection.

from cryoPARES.inference.daemon.queueManager import queue_connection

# Submit a single .star file (default queue, default ip/port/authkey)
with queue_connection(ip="localhost", port=50000, authkey="shared_queue_key") as queue:
    queue.put("/path/to/particles.star")

# Submit to a named queue on a remote server
with queue_connection(ip="192.168.1.10", port=51000, authkey="mysecret",
                      queue_name="my_pipeline") as queue:
    for star_file in ["/path/to/particles1.star", "/path/to/particles2.star"]:
        queue.put(star_file)

# Submit particles already loaded in memory (no disk I/O on the worker side)
import starfile
star = starfile.read("/path/to/particles.star")
# Optionally filter star["particles"] here before submitting
with queue_connection() as queue:
    queue.put((star["optics"], star["particles"]))  # (optics_df, particles_df)

# Send poison pill to terminate workers of a specific queue gracefully
with queue_connection(queue_name="my_pipeline") as queue:
    queue.put(None)

Input formats accepted by the Daemon Inferencer:

String: Path to a .star file
Tuple (optics_df, particles_df): pandas DataFrames already loaded or filtered in memory; no disk I/O required by the worker
None: Poison pill — signals workers to terminate gracefully

Start the Daemon Inferencer(s):

You can start as many inference workers as you want. Each worker will take jobs from the queue and process them. Important: each worker must have its own --results_dir. The inference arguments are the same as for cryopares_infer, plus the network arguments below.

python -m cryoPARES.inference.daemon.daemonInference \
    --checkpoint_dir DIR --results_dir DIR \
    [--net_address IP] [--net_port PORT] [--net_authkey KEY] [--net_queue_name NAME] \
    [inference options…]

Argument	Default	Description
`--net_address`	`localhost`	IP address of the queue manager server.
`--net_port`	`50000`	Port of the queue manager server.
`--net_authkey`	`shared_queue_key`	Authentication key (must match the server).
`--net_queue_name`	`default`	Name of the queue to consume jobs from.

# Two workers on the default queue
python -m cryoPARES.inference.daemon.daemonInference \
    --checkpoint_dir /path/to/checkpoint --results_dir /path/to/results_worker1 \
    --particles_dir /path/to/particles
python -m cryoPARES.inference.daemon.daemonInference \
    --checkpoint_dir /path/to/checkpoint --results_dir /path/to/results_worker2 \
    --particles_dir /path/to/particles

# Worker on a named queue from a remote server
python -m cryoPARES.inference.daemon.daemonInference \
    --checkpoint_dir /path/to/checkpoint --results_dir /path/to/results_pipe2 \
    --net_address 192.168.1.10 --net_port 51000 --net_authkey mysecret \
    --net_queue_name my_pipeline

Materialize the Volume:

You can materialize the final 3D volume from the partial results at any time, even while the inference workers are still running. The script will combine all the available partial results.

python -m cryoPARES.inference.daemon.materializePartialResults \
    --partial_outputs_dirs /path/to/results_worker1/ /path/to/results_worker2 \
    --output_mrc /path/to/final_map.mrc --output_star /path/to/final_particles.star

Utility Tools

CryoPARES includes standalone utility tools for projection matching and reconstruction. Note: These tools are automatically used within the cryopares_infer workflow, but can also be run independently if needed.

Projection Matching

The projection matching utility performs local pose refinement by searching around existing particle orientations to find the best match against reference volume projections. This is used automatically during inference for local refinement, but can also be run standalone.

Usage:

cryopares_projmatching [ARGUMENTS] [--config [CONFIG_OVERRIDES]] [--show-config]

Key Arguments:

Required Parameters:

--reference_vol: Path to reference 3D volume (.mrc file) for generating projection templates
--particles_star_fname: Path to input STAR file with particle metadata
--out_fname: Path for output STAR file with aligned particle poses
--particles_dir: Root directory for particle image paths. If provided, overrides paths in the .star file

Optional Parameters:

--mask_radius_angs: Radius of circular mask in Angstroms applied to particle images
--grid_distance_degs: Maximum angular distance in degrees for local refinement search. Grid ranges from -grid_distance_degs to +grid_distance_degs around predicted pose (Default: 4.0)
--grid_step_degs: Angular step size in degrees for grid search during local refinement (Default: 2.0)
--return_top_k_poses: Number of top matching poses to save per particle (Default: 1)
--filter_resolution_angst: Low-pass filter resolution in Angstroms applied to reference volume before matching
--n_jobs: Number of parallel worker processes for distributed projection matching (Default: 1)
--num_dataworkers: Number of CPU workers per PyTorch DataLoader for data loading (Default: 1)
--batch_size: Number of particles to process simultaneously per job (Default: 32)
--use_cuda: Enable GPU acceleration. If False, runs on CPU only (Default: True)
--verbose: Enable verbose logging output (Default: False)
--float32_matmul_precision: PyTorch float32 matrix multiplication precision mode ("highest", "high", or "medium") (Default: high)
--gpu_id: Specific GPU device ID to use (if multiple GPUs available)
--n_first_particles: Process only the first N particles from dataset (for testing or validation)
--correct_ctf: Apply CTF correction during projection matching (Default: True)
--halfmap_subset: Select half-map subset (1 or 2) for half-map validation

For additional details, see the Command-Line Interface documentation.

Post-processing

The post-processing utility sharpens reconstructed volumes using B-factor estimation (Guinier analysis) and FSC weighting. Run it after reconstruction to improve map interpretability.

Usage:

cryopares_postprocess bfactor \
    --half1 /path/to/half1.mrc \
    --half2 /path/to/half2.mrc \
    --mask /path/to/mask.mrc \      # or --auto_mask
    --output_dir /path/to/postprocess_output

For all options, see the CLI Reference.

Reconstruction

The reconstruction utility creates a 3D volume from particles with known poses using direct Fourier inversion. This is used automatically during inference to generate the final 3D map, but can also be run standalone for particles aligned by other methods (e.g., RELION).

Usage:

cryopares_reconstruct [--config [CONFIG_OVERRIDES]] [--show-config]

Key Arguments:

Required Parameters:

--particles_star_fname: Path to input STAR file with particle metadata and poses to reconstruct
--symmetry: Point group symmetry of the volume for reconstruction (e.g., C1, D2, I, O, T)
--output_fname: Path for output reconstructed 3D volume (.mrc file)

Optional Parameters:

--particles_dir: Root directory for particle image paths. If provided, overrides paths in the .star file
--n_jobs: Number of parallel worker processes for distributed reconstruction (Default: 1)
--num_dataworkers: Number of CPU workers per PyTorch DataLoader for data loading (Default: 1)
--batch_size: Number of particles to backproject simultaneously per job (Default: 128)
--use_cuda: Enable GPU acceleration for reconstruction. If False, runs on CPU only (Default: True)
--correct_ctf: Apply CTF correction during reconstruction (Default: True)
--eps: Regularization mode and strength. Sign selects mode: eps >= 0 uses Tikhonov regularization, eps < 0 uses RELION-style radial averaging. Magnitude sets scale: for Tikhonov, eps is the regularization constant (ideally 1/SNR); for radial averaging, abs(eps) is the divisor for radial weights (RELION uses 1000). Recommended: -1000 for radial averaging, 1e-3 for Tikhonov (Default: -1000.0)
--min_denominator_value: Minimum denominator threshold for numerical stability (prevents division by zero). Applied as final safety clamp regardless of regularization mode. RELION uses 1e-6 (Default: 1e-06)
--use_only_n_first_batches: Reconstruct using only first N batches (for testing or quick validation)
--float32_matmul_precision: PyTorch float32 matrix multiplication precision mode ("highest", "high", or "medium") (Default: high)
--weight_with_confidence: Apply per-particle confidence weighting during backprojection. If True, particles with higher confidence contribute more to reconstruction. It reads the confidence from the metadata label "rlnParticleFigureOfMerit" (Default: False)
--halfmap_subset: Select half-map subset (1 or 2) for half-map reconstruction and validation
--apply_soft_mask: Apply soft spherical masking after reconstruction to reduce edge artifacts (RELION-style) (Default: True)
--mask_radius_pix: Radius for soft mask in pixels. If negative, defaults to box_size/2 (Default: -1.0)
--mask_edge_width: Width of cosine falloff edge in pixels (Default: 3)

For additional details, see the Command-Line Interface documentation.

Checkpoint Compactification

After training, you can package your checkpoint into a compact ZIP file for easy distribution and storage. This reduces the checkpoint size from ~40 GB to ~10 GB by removing training logs, metrics, and intermediate files while keeping everything needed for inference.

Compactify a checkpoint:

python -m cryoPARES.scripts.compactify_checkpoint \
    --checkpoint_dir /path/to/training_output/version_0

This creates version_0_compact.zip containing only the essential files.

Use the compactified checkpoint for inference:

cryopares_infer \
    --particles_star_fname /path/to/particles.star \
    --checkpoint_dir /path/to/version_0_compact.zip \
    --results_dir /path/to/results

The ZIP file is used directly without extraction, making it ideal for:

Sharing models with collaborators
Archiving trained models efficiently
Deploying to inference servers with limited storage

Documentation

Training Guide - Comprehensive guide on training models, monitoring with TensorBoard, and avoiding overfitting/underfitting
API Reference - Auto-generated API documentation with type hints (hosted on GitHub Pages)
Configuration Guide - Complete reference for all configuration parameters
Troubleshooting Guide - Solutions to common issues
CLI Reference - Command-line interface documentation

Building Documentation Locally:

cd docs
pip install -r requirements.txt
make html
# Open _build/html/index.html in your browser

Configuration System

CryoPARES uses a flexible configuration system that allows you to manage settings from multiple sources.

--show-config: To see all available options, run any main script with the --show-config flag. This will print a comprehensive list of all parameters, their current values, and their paths.
```
cryopares_train --show-config
```
YAML Files: Create a .yaml file with your desired parameters.
Command-Line Overrides: Pass KEY=VALUE pairs to the program. Use dot notation to specify nested parameters (e.g., models.image2sphere.lmax=6).
Direct Arguments: Use standard command-line flags (e.g., --batch_size 32).

Precedence: Direct command-line arguments override --config overrides, which override YAML files, which override the default configuration.

For a complete reference of all configuration parameters, see the Configuration Guide.

Example Workflow

Quick Start with Test Dataset

Before running on your own data, we recommend testing cryoPARES with a small dataset. If you don't have a small particles .star file, you can download some examples from CESPED (Cryo-EM Supervised Pose Estimation Dataset). CESPED provides benchmark datasets specifically designed for supervised pose estimation.

Install CESPED (Optional)

pip install cesped

Download a Test Dataset

For a quick test, use the small TEST dataset (subset of EMPIAR-11120):

python -m cesped.particlesDataset download_entry -t TEST --benchmarkDir /path/to/your/data

Please, notice that you won't be able to train an accurate model using this small dataset, but it will be good to check that you can run the full workflow

For a full benchmark dataset, you can download other CEPSPED entries such as the EMPIAR-10166 (Human 26S proteasome, C1 symmetry, 238K particles):

# Download both half-sets
python -m cesped.particlesDataset download_entry -t 10166 --benchmarkDir /path/to/your/data

Training and Inference Example

Once you have downloaded a CESPED dataset, you can train and test cryoPARES:

Train a model on an existing, aligned dataset:

cryopares_train  \
   --symmetry C1  \
   --particles_star_fname /path/to/your/data/CESPED/TEST/particles_merged.star  \
   --particles_dir /path/to/cesped_benchmark/TEST/   \
   --train_save_dir /path/to/training_output   \
   --n_epochs 3  \
   --batch_size 32  \
   --sampling_rate_angs_for_nnet 1.5 \
   --image_size_px_for_nnet 64 \
   --config models.image2sphere.lmax=6  models.image2sphere.so3components.so3outputgrid.hp_order=3  models.image2sphere.so3components.i2sprojector.sphere_fdim=64 models.image2sphere.so3components.s2conv.f_out=16 models.image2sphere.imageencoder.unet.out_channels_first=4

Notice that we have added several --config flags to create a small model, that will not perform well, but it will be quick. We are also using a --image_size_px_for_nnet much smaller than advisable (we recomend 128 to 256, depending on the particle)

For production use:

cryopares_train \
    --symmetry C1 \
    --particles_star_fname /path/to/particles.star \
    --particles_dir /path/to/particles/ \ 
    --train_save_dir /path/to/training_output \
    --n_epochs 100 \
    --batch_size 32 \
    --image_size_px_for_nnet 160 \
    --sampling_rate_angs_for_nnet 1.5  #We are using the default model, hence no --config

You can tweak the neural network setting different values with the --config flag. Use --show-config to get the list of all available options.

After training, there should be a directory called /path/to/training_output/version_* with our checkpoint. We need to provide such a directory to the inference command.

Run inference on a new dataset with local refinement and reconstruction:

cryopares_infer \
    --particles_star_fname /path/to/new_particles.star \
    --particles_dir /path/to/particles \
    --checkpoint_dir /path/to/training_output/version_0 \
    --results_dir /path/to/inference_results \
    --reference_map /path/to/initial_model.mrc \ #If not provided, it is automatically generated from the training data
    --batch_size 32 \
    --grid_distance_degs 12 \  #Local search will be from -12º to +12º
    --directional_zscore_thr 1.0   # Remove all particles with directional zscore <1.0

Getting Help

If you encounter issues:

Check the Troubleshooting Guide for common problems and solutions
Review the Training Guide for training best practices
Consult the Configuration Guide for parameter details
See the API Reference for programmatic usage

For bugs or feature requests, please open an issue on GitHub.

License and Attribution

CryoPARES is licensed under the GNU General Public License v3.0 (GPL-3.0). See the LICENSE file for details.

Third-Party Code

This project incorporates code derived from the following open-source projects:

torch-fourier-slice (Copyright © 2023 Alister Burt, BSD 3-Clause License)
- Used in: cryoPARES/reconstruction/insert_central_slices_rfft_3d.py
- Used in: cryoPARES/projmatching/projmatchingUtils/extract_central_slices_as_real.py

See THIRD-PARTY-LICENSES for complete license texts and attribution details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

rsanchez1369

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cryopares-0.1.0.tar.gz (369.4 kB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cryopares-0.1.0-py3-none-any.whl (352.9 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file cryopares-0.1.0.tar.gz.

File metadata

Download URL: cryopares-0.1.0.tar.gz
Upload date: Jun 4, 2026
Size: 369.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cryopares-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ac928c722daa01b4bf5aa91b96a6b689ca0f0effb66bb6d5ea01e640e27c0b69`
MD5	`3c14f9c1e0d84f4c0d6416f44ab2dec7`
BLAKE2b-256	`d0d45e07552e2676ae75f41a33bf65df0bfa6a104390869eb8888d79be567ca3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cryopares-0.1.0.tar.gz:

Publisher: publish.yml on rsanchezgarc/cryoPARES

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cryopares-0.1.0.tar.gz
- Subject digest: ac928c722daa01b4bf5aa91b96a6b689ca0f0effb66bb6d5ea01e640e27c0b69
- Sigstore transparency entry: 1726033911
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: rsanchezgarc/cryoPARES@821b5932af7f9972c29e30a00cbe032d2aed9220
- Branch / Tag: refs/tags/0.1.0
- Owner: https://github.com/rsanchezgarc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@821b5932af7f9972c29e30a00cbe032d2aed9220
- Trigger Event: push

File details

Details for the file cryopares-0.1.0-py3-none-any.whl.

File metadata

Download URL: cryopares-0.1.0-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 352.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cryopares-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84b95db47053f0215863f73239b2d453457d886c97c4a112f36b147ee82e97bf`
MD5	`76ecdb0fa1bb4357000a7a06b7ab1bbb`
BLAKE2b-256	`a6e5f0548e584576cfdd14ca8a938a295102951c24d79942b2561830d676d000`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cryopares-0.1.0-py3-none-any.whl:

Publisher: publish.yml on rsanchezgarc/cryoPARES

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cryopares-0.1.0-py3-none-any.whl
- Subject digest: 84b95db47053f0215863f73239b2d453457d886c97c4a112f36b147ee82e97bf
- Sigstore transparency entry: 1726034537
- Sigstore integration time: Jun 4, 2026
Source repository:
- Permalink: rsanchezgarc/cryoPARES@821b5932af7f9972c29e30a00cbe032d2aed9220
- Branch / Tag: refs/tags/0.1.0
- Owner: https://github.com/rsanchezgarc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@821b5932af7f9972c29e30a00cbe032d2aed9220
- Trigger Event: push

cryoPARES 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

CryoPARES: Cryo-EM Pose Assignment for Related Experiments via Supervised deep learning

Table of Contents

Installation

Option 1: Install from GitHub (Recommended for Users)

Option 2: Install from a Local Clone (Recommended for Developers)

Usage

Training

Inference

Static Mode

Daemon Mode (On-the-fly)

Utility Tools

Projection Matching

Post-processing

Reconstruction

Checkpoint Compactification

Documentation

Configuration System

Example Workflow

Quick Start with Test Dataset

Install CESPED (Optional)

Download a Test Dataset

Training and Inference Example

Getting Help

License and Attribution

Third-Party Code

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance