Skip to main content

Tools for working with astrocyte dynamics data

Project description

toile

CLI tools for working with astrocyte dynamics data

Toile is a Python package for converting microscopy TIFF stacks into WebDataset format for machine learning pipelines. It handles OME-TIFF metadata extraction, batch processing, and creates sharded tar archives optimized for distributed training.

—❤️‍🔥 Forecast

Features

  • OME-TIFF Support: Automatic extraction of spatial, temporal, and experimental metadata from OME-TIFF XML annotations
  • Batch Processing: Process multiple recordings using glob patterns or YAML configuration files
  • Custom Metadata Parsing: Flexible filename parsing system for extracting experimental identifiers
  • Sharded Archives: Configurable shard sizes for WebDataset format (850MB standard, 38MB for Bluesky PDS)
  • ML-Ready: Optional uint8 normalization for efficient model training
  • atdata Integration: Built on the atdata PackableSample framework for data transformation pipelines

Installation

Install using uv (recommended) or pip:

# Using uv
uv add toile

# Using pip
pip install toile

For development:

git clone https://github.com/forecast-bio/toile.git
cd toile
uv sync --all-extras --dev

Quick Start

Export a TIFF stack to WebDataset format:

# Basic usage - export frames from a single recording
toile export frames /path/to/recording/ /output/dataset

# With uint8 normalization for ML
toile export frames /path/to/recording/ /output/dataset --uint8 --verbose

# Batch processing with glob patterns
toile export frames "/data/*/recording*/" /output/dataset --stem my_dataset

# Using PDS-compatible shard size for Bluesky
toile export frames /data/recordings/ /output/dataset --pds

CLI Commands

toile export frames

Convert TIFF stacks to WebDataset format as individual frames.

toile export frames INPUT OUTPUT [OPTIONS]

Arguments:

  • INPUT: Path to TIFF directory or YAML config file
  • OUTPUT: Output directory for tar archives

Options:

  • --stem TEXT: Custom stem for output filenames (default: output directory name)
  • --shard-size INT: Maximum shard size in bytes (default: auto-selected)
  • --pds: Use PDS-compatible shard size (38MB for Bluesky)
  • --uint8: Normalize images to uint8 (0-255) range
  • --compressed: Enable compression (not yet implemented)
  • --verbose: Print detailed progress information

Examples:

# Export single recording with verbose output
toile export frames /data/mouse_123/recording_001/ /output/dataset --verbose

# Batch export with custom naming
toile export frames "/data/experiment_*/*.tif" /output/dataset --stem exp2024

# ML-ready export with normalization
toile export frames /data/recordings/ /output/dataset --uint8 --pds

toile export test-frames

Generate a synthetic test dataset for development and testing.

toile export test-frames OUTPUT [OPTIONS]

Arguments:

  • OUTPUT: Output directory for test dataset

Options:

  • --stem TEXT: Custom stem for output filenames
  • --compressed: Enable gzip compression

Example:

toile export test-frames /tmp/test_dataset --compressed

Configuration Files

For complex batch processing, use YAML configuration files:

# config.yaml
inputs:
  - "/data/experiment1/**/*.tif"
  - "/data/experiment2/**/*.tif"

output_stem: "astrocyte_dataset"
shard_size: 38000000  # 38MB for PDS compatibility
to_uint8: true

# Optional: Extract metadata from filenames
filename_spec:
  template: "mouse_{mouse_id}_slice_{slice_id}_{date}.tif"
  transforms:
    mouse_id: int
    slice_id: identity
    date: date_compact

Then run:

toile export frames config.yaml /output/dataset

Data Schema

Toile uses structured schemas built on the atdata framework:

  • Movie: Full TIFF stack with metadata
  • Frame: Individual image frame with combined metadata
  • SliceRecordingFrame: Experimental frames with mouse/slice identifiers
  • ImageSample: Minimal image data for ML pipelines

Metadata includes acquisition timestamps, physical scales, stage positions, and channel information extracted from OME-TIFF annotations.

Output Format

WebDataset tar archives contain samples with the following structure:

sample-000000-000.npy    # Image data as numpy array
sample-000000-000.json   # Metadata dictionary
sample-000000-001.npy
sample-000000-001.json
...

Each shard is automatically numbered (e.g., dataset-000000.tar, dataset-000001.tar) when the size limit is reached.

Development

Run tests:

uv run pytest

Build package:

uv build

License

This project is licensed under the Mozilla Public License 2.0 (MPL-2.0) - see the LICENSE file for details.

Acknowledgments

Built with:

  • atdata - Streaming schematized datasets framework
  • webdataset - Efficient streaming datasets for ML and more
  • scikit-image - Some good standard impl for image basics

Claude wrote the majority of the docs—if they hallucinated anything, let us know in the Issues!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toile-0.1.1b1.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toile-0.1.1b1-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file toile-0.1.1b1.tar.gz.

File metadata

  • Download URL: toile-0.1.1b1.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for toile-0.1.1b1.tar.gz
Algorithm Hash digest
SHA256 2ae6521104ea61bacf370359a6fb0070870ae1f5fd8c40081f33afb92ad64e5e
MD5 9682e058bd1b3b5d319dc043977ebc4e
BLAKE2b-256 2e65c0c270a5f1715adad29ff3f133148a109f77b6c909315624593ed3bc424e

See more details on using hashes here.

File details

Details for the file toile-0.1.1b1-py3-none-any.whl.

File metadata

  • Download URL: toile-0.1.1b1-py3-none-any.whl
  • Upload date:
  • Size: 21.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for toile-0.1.1b1-py3-none-any.whl
Algorithm Hash digest
SHA256 f40244e040b659a8cf6c0a5773e3a50ff57fc79acbe0901788c4795fa27613d6
MD5 e0965171f7de4ece88a1ab07c247170f
BLAKE2b-256 fa09df82e98e713188df0b1b74f281e38524102d0cb102f32365233a762f3ea1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page