Torch utilities for copick

These details have not been verified by PyPI

Project description

copick-torch

Torch utilities for copick

Dataset classes

SimpleCopickDataset: Main dataset class with caching and augmentation support
MinimalCopickDataset: Simpler dataset implementation with optional preloading

MinimalCopickDataset Usage

Direct usage in Python

from copick_torch import MinimalCopickDataset
from torch.utils.data import DataLoader

# Create a minimal dataset - no caching, no augmentation
dataset = MinimalCopickDataset(
    dataset_id=10440,                 # Dataset ID from CZ portal
    overlay_root='/tmp/test/',        # Overlay root directory
    boxsize=(48, 48, 48),             # Size of the subvolumes
    voxel_spacing=10.012,             # Voxel spacing
    include_background=True,          # Include background samples
    background_ratio=0.2,             # Background ratio
    min_background_distance=48,       # Minimum distance from particles for background
    max_samples=None                  # No limit on samples
)

# Print dataset information
print(f"Dataset size: {len(dataset)}")
print(f"Classes: {dataset.keys()}")
print(f"Class distribution: {dataset.get_class_distribution()}")

# Create a DataLoader
dataloader = DataLoader(
    dataset,
    batch_size=8,
    shuffle=True,
    num_workers=4,
    pin_memory=True
)

# Training loop
for volume, label in dataloader:
    # volume shape: [batch_size, 1, depth, height, width]
    # label: [batch_size] class indices
    # Your training code here
    pass

Saving and loading datasets

The MinimalCopickDataset supports preloading all subvolumes into memory and saving the actual tensor data to disk, making it easy to share and load datasets without needing access to the original tomogram data:

from copick_torch import MinimalCopickDataset

# Create a dataset with preloading enabled (default)
dataset = MinimalCopickDataset(
    dataset_id=10440,
    overlay_root='/tmp/copick_overlay',
    preload=True  # This preloads all subvolumes into memory
)

# Save the dataset with preloaded tensors
dataset.save('/path/to/save')

# Load the dataset from the saved tensors (no need for original tomogram data)
loaded_dataset = MinimalCopickDataset.load('/path/to/save')

You can also use the provided utility script to save a dataset directly from the command line:

# Save with preloading (default)
python scripts/save_torch_dataset.py --dataset_id 10440 --output_dir /path/to/save

# Save without preloading (not recommended)
python scripts/save_torch_dataset.py --dataset_id 10440 --output_dir /path/to/save --no-preload

Options:

  --dataset_id DATASET_ID   Dataset ID from the CZ cryoET Data Portal
  --output_dir OUTPUT_DIR   Directory to save the dataset
  --overlay_root OVERLAY_ROOT
                            Root directory for overlay storage (default: /tmp/copick_overlay)
  --boxsize Z Y X           Size of subvolumes to extract (default: 48 48 48)
  --voxel_spacing SPACING   Voxel spacing to use (default: 10.012)
  --include_background      Include background samples in the dataset
  --background_ratio RATIO  Ratio of background to particle samples (default: 0.2)
  --no-preload              Disable preloading tensors (not recommended)
  --verbose                 Enable verbose output

Inspecting saved datasets

You can display detailed information about a saved dataset using the provided utility script:

python scripts/info_torch_dataset.py --input_dir /path/to/saved/dataset

This will display:

Basic dataset metadata (dataset ID, box size, voxel spacing, etc.)
Class mapping information
Total number of samples
Class distribution (counts and percentages)
Tomogram information
Sample volume shape

The script can also generate visualizations:

python scripts/info_torch_dataset.py --input_dir /path/to/dataset --output_pdf dataset_report.pdf --samples_per_class 5

Options:

  --input_dir INPUT_DIR     Directory where the dataset is saved
  --output_pdf OUTPUT_PDF   Path to save visualization PDF (default: input_dir/dataset_overview.pdf)
  --samples_per_class SAMPLES_PER_CLASS
                            Number of sample visualizations per class (default: 3)
  --verbose                 Enable verbose output

Quick demo

# Simple training example
uv run examples/simple_training.py

# Fourier augmentation demo
uv run examples/fourier_augmentation_demo.py

# MONAI-based augmentation demo
uv run examples/monai_augmentation_demo.py

# SplicedMixup with Gaussian blur visualization
uv run examples/spliced_mixup_example.py

# SplicedMixup with Fourier augmentation visualization
uv run examples/spliced_mixup_fourier_example.py

# Generate augmentation documentation
python scripts/generate_augmentation_docs.py

# Generate dataset documentation
python scripts/generate_dataset_examples.py

# Save dataset to disk with preloaded tensors
python scripts/save_torch_dataset.py --dataset_id 10440 --output_dir /path/to/save

# Display information about a saved dataset
python scripts/info_torch_dataset.py --input_dir /path/to/save

# Visualize dataset with orthogonal views and projections
python examples/visualize_dataset.py --dataset_dir /path/to/save --output_file report.png

# Create enhanced visual report with sum projections
python examples/visualize_dataset_enhanced.py --dataset_dir /path/to/save --output_file report_enhanced.png

Dataset Visualization

The repository includes two scripts for visualizing datasets:

Basic Visualization

The visualize_dataset.py script creates a simple visualization of dataset samples with orthogonal views and maximum intensity projections:

python examples/visualize_dataset.py --dataset_dir /path/to/saved/dataset --output_file report.png

Options:

  --dataset_dir DATASET_DIR   Directory where the dataset was saved
  --output_file OUTPUT_FILE   Output file for the visualization (default: dataset_visualization.png)
  --samples_per_class SAMPLES_PER_CLASS
                            Number of samples to display per class (default: 2)
  --dpi DPI                 DPI for the output image (default: 150)
  --verbose                 Enable verbose output

Enhanced Visualization

The visualize_dataset_enhanced.py script creates a more elegant visualization with sum projections and better layout:

python examples/visualize_dataset_enhanced.py --dataset_dir /path/to/saved/dataset --output_file report_enhanced.png

Options:

  --dataset_dir DATASET_DIR   Directory where the dataset was saved
  --output_file OUTPUT_FILE   Output file for the visualization (default: dataset_visualization_enhanced.png)
  --samples_per_class SAMPLES_PER_CLASS
                            Number of samples to display per class (default: 2)
  --dpi DPI                 DPI for the output image (default: 150)
  --cmap CMAP               Colormap to use for visualization (default: viridis)
  --verbose                 Enable verbose output

Features

Augmentations

copick-torch includes various MONAI-based data augmentation techniques for 3D tomographic data:

MixupTransform: MONAI-compatible implementation of the Mixup technique (Zhang et al., 2018), creating virtual training examples by mixing pairs of inputs and their labels with a random proportion.
FourierAugment3D: MONAI-compatible implementation of Fourier-based augmentation that operates in the frequency domain, including random frequency dropout, phase noise injection, and intensity scaling.

Example usage of MONAI-based Fourier augmentation:

from copick_torch.monai_augmentations import FourierAugment3D

# Create the augmenter
fourier_aug = FourierAugment3D(
    freq_mask_prob=0.3,        # Probability of masking frequency components
    phase_noise_std=0.1,       # Standard deviation of phase noise
    intensity_scaling_range=(0.8, 1.2),  # Range for random intensity scaling
    prob=1.0                   # Probability of applying the transform
)

# Apply to a 3D volume (with PyTorch tensor)
augmented_volume = fourier_aug(volume_tensor)

Documentation

See the docs directory for documentation and examples:

Augmentation Examples: Visualizations of various augmentations applied to different classes from the dataset used in the spliced_mixup_example.py example.
Dataset Examples: Examples of volumes from each class in the dataset used by the CopickDataset classes.

Citation

If you use copick-torch in your research, please cite:

@article{harrington2024open,
  title={Open-source Tools for CryoET Particle Picking Machine Learning Competitions},
  author={Harrington, Kyle I. and Zhao, Zhuowen and Schwartz, Jonathan and Kandel, Saugat and Ermel, Utz and Paraan, Mohammadreza and Potter, Clinton and Carragher, Bridget},
  journal={bioRxiv},
  year={2024},
  doi={10.1101/2024.11.04.621608}
}

This software was introduced in a NeurIPS 2024 Workshop on Machine Learning in Structural Biology as "Open-source Tools for CryoET Particle Picking Machine Learning Competitions".

Development

Install development dependencies

pip install ".[test]"

Run tests

pytest

View coverage report

# Generate terminal, HTML and XML coverage reports
pytest --cov=copick_torch --cov-report=term --cov-report=html --cov-report=xml

Or use the self-contained coverage script:

# Run tests and generate coverage reports with badge
python scripts/coverage_report.py --term

After running the tests with coverage, you can:

View the terminal report directly in your console
Open htmlcov/index.html in a browser to see the detailed HTML report
View the generated coverage badge (coverage-badge.svg)
Check the Codecov dashboard for the project's coverage metrics

Code of Conduct

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.

Reporting Security Issues

If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.1

Jun 29, 2026

1.1.0

Jun 26, 2026

1.0.1

Dec 3, 2025

1.0.0

Oct 10, 2025

0.2.1

Jul 17, 2025

0.2.0

Jul 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

copick_torch-1.1.1.tar.gz (4.3 MB view details)

Uploaded Jun 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

copick_torch-1.1.1-py3-none-any.whl (91.1 kB view details)

Uploaded Jun 29, 2026 Python 3

File details

Details for the file copick_torch-1.1.1.tar.gz.

File metadata

Download URL: copick_torch-1.1.1.tar.gz
Upload date: Jun 29, 2026
Size: 4.3 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for copick_torch-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`2478154ca82d67c0bc98cd0c7aa7c913c165a999f6a601d986cd4a67f0590467`
MD5	`2e8de9f83efa083e4c2c5a7ec4f3b9f2`
BLAKE2b-256	`cc296996b8857ec2188c6bf43fc9db5477adb49bddaf1d9c4a7ba2e43edda8fe`

See more details on using hashes here.

Provenance

The following attestation bundles were made for copick_torch-1.1.1.tar.gz:

Publisher: release-please.yml on copick/copick-torch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: copick_torch-1.1.1.tar.gz
- Subject digest: 2478154ca82d67c0bc98cd0c7aa7c913c165a999f6a601d986cd4a67f0590467
- Sigstore transparency entry: 2005996637
- Sigstore integration time: Jun 29, 2026
Source repository:
- Permalink: copick/copick-torch@9121fd7c6250811c301d6f1479460da833f5bd02
- Branch / Tag: refs/heads/main
- Owner: https://github.com/copick
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-please.yml@9121fd7c6250811c301d6f1479460da833f5bd02
- Trigger Event: push

File details

Details for the file copick_torch-1.1.1-py3-none-any.whl.

File metadata

Download URL: copick_torch-1.1.1-py3-none-any.whl
Upload date: Jun 29, 2026
Size: 91.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for copick_torch-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8eecaba33503583df58d25d0e69631a3cad700baafcdd1a08dc1a584e0aaa0f2`
MD5	`aa270a905c8be95394ca784f01717c81`
BLAKE2b-256	`fffeec7cf26f4fa34ef0a4d99c2bb821c8e35b9d51dfb5155bb797a45c739883`

See more details on using hashes here.

Provenance

The following attestation bundles were made for copick_torch-1.1.1-py3-none-any.whl:

Publisher: release-please.yml on copick/copick-torch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: copick_torch-1.1.1-py3-none-any.whl
- Subject digest: 8eecaba33503583df58d25d0e69631a3cad700baafcdd1a08dc1a584e0aaa0f2
- Sigstore transparency entry: 2005996772
- Sigstore integration time: Jun 29, 2026
Source repository:
- Permalink: copick/copick-torch@9121fd7c6250811c301d6f1479460da833f5bd02
- Branch / Tag: refs/heads/main
- Owner: https://github.com/copick
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-please.yml@9121fd7c6250811c301d6f1479460da833f5bd02
- Trigger Event: push

copick-torch 1.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

copick-torch

Dataset classes

MinimalCopickDataset Usage

Direct usage in Python

Saving and loading datasets

Inspecting saved datasets

Quick demo

Dataset Visualization

Basic Visualization

Enhanced Visualization

Features

Augmentations

Documentation

Citation

Development

Install development dependencies

Run tests

View coverage report

Code of Conduct

Reporting Security Issues

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance