Tools for working with astrocyte dynamics data
Project description
toile
CLI tools for working with astrocyte dynamics data
Toile is a Python package for converting microscopy TIFF stacks into WebDataset format for machine learning pipelines. It handles OME-TIFF metadata extraction, batch processing, and creates sharded tar archives optimized for distributed training.
—❤️🔥 Forecast
Features
- OME-TIFF Support: Automatic extraction of spatial, temporal, and experimental metadata from OME-TIFF XML annotations
- Batch Processing: Process multiple recordings using glob patterns or YAML configuration files
- Custom Metadata Parsing: Flexible filename parsing system for extracting experimental identifiers
- Sharded Archives: Configurable shard sizes for WebDataset format (850MB standard, 38MB for Bluesky PDS)
- ML-Ready: Optional uint8 normalization for efficient model training
- atdata Integration: Built on the atdata PackableSample framework for data transformation pipelines
Installation
Install using uv (recommended) or pip:
# Using uv
uv add toile
# Using pip
pip install toile
For development:
git clone https://github.com/forecast-bio/toile.git
cd toile
uv sync --all-extras --dev
Quick Start
Export a TIFF stack to WebDataset format:
# Basic usage - export frames from a single recording
toile export frames /path/to/recording/ /output/dataset
# With uint8 normalization for ML
toile export frames /path/to/recording/ /output/dataset --uint8 --verbose
# Batch processing with glob patterns
toile export frames "/data/*/recording*/" /output/dataset --stem my_dataset
# Using PDS-compatible shard size for Bluesky
toile export frames /data/recordings/ /output/dataset --pds
CLI Commands
toile export frames
Convert TIFF stacks to WebDataset format as individual frames.
toile export frames INPUT OUTPUT [OPTIONS]
Arguments:
INPUT: Path to TIFF directory or YAML config fileOUTPUT: Output directory for tar archives
Options:
--stem TEXT: Custom stem for output filenames (default: output directory name)--shard-size INT: Maximum shard size in bytes (default: auto-selected)--pds: Use PDS-compatible shard size (38MB for Bluesky)--uint8: Normalize images to uint8 (0-255) range--compressed: Enable compression (not yet implemented)--verbose: Print detailed progress information
Examples:
# Export single recording with verbose output
toile export frames /data/mouse_123/recording_001/ /output/dataset --verbose
# Batch export with custom naming
toile export frames "/data/experiment_*/*.tif" /output/dataset --stem exp2024
# ML-ready export with normalization
toile export frames /data/recordings/ /output/dataset --uint8 --pds
toile export test-frames
Generate a synthetic test dataset for development and testing.
toile export test-frames OUTPUT [OPTIONS]
Arguments:
OUTPUT: Output directory for test dataset
Options:
--stem TEXT: Custom stem for output filenames--compressed: Enable gzip compression
Example:
toile export test-frames /tmp/test_dataset --compressed
Configuration Files
For complex batch processing, use YAML configuration files:
# config.yaml
inputs:
- "/data/experiment1/**/*.tif"
- "/data/experiment2/**/*.tif"
output_stem: "astrocyte_dataset"
shard_size: 38000000 # 38MB for PDS compatibility
to_uint8: true
# Optional: Extract metadata from filenames
filename_spec:
template: "mouse_{mouse_id}_slice_{slice_id}_{date}.tif"
transforms:
mouse_id: int
slice_id: identity
date: date_compact
Then run:
toile export frames config.yaml /output/dataset
Data Schema
Toile uses structured schemas built on the atdata framework:
Movie: Full TIFF stack with metadataFrame: Individual image frame with combined metadataSliceRecordingFrame: Experimental frames with mouse/slice identifiersImageSample: Minimal image data for ML pipelines
Metadata includes acquisition timestamps, physical scales, stage positions, and channel information extracted from OME-TIFF annotations.
Output Format
WebDataset tar archives contain samples with the following structure:
sample-000000-000.npy # Image data as numpy array
sample-000000-000.json # Metadata dictionary
sample-000000-001.npy
sample-000000-001.json
...
Each shard is automatically numbered (e.g., dataset-000000.tar, dataset-000001.tar) when the size limit is reached.
Development
Run tests:
uv run pytest
Build package:
uv build
License
This project is licensed under the Mozilla Public License 2.0 (MPL-2.0) - see the LICENSE file for details.
Acknowledgments
Built with:
- atdata - Streaming schematized datasets framework
- webdataset - Efficient streaming datasets for ML and more
- scikit-image - Some good standard impl for image basics
Claude wrote the majority of the docs—if they hallucinated anything, let us know in the Issues!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toile-0.1.1b1.tar.gz.
File metadata
- Download URL: toile-0.1.1b1.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ae6521104ea61bacf370359a6fb0070870ae1f5fd8c40081f33afb92ad64e5e
|
|
| MD5 |
9682e058bd1b3b5d319dc043977ebc4e
|
|
| BLAKE2b-256 |
2e65c0c270a5f1715adad29ff3f133148a109f77b6c909315624593ed3bc424e
|
File details
Details for the file toile-0.1.1b1-py3-none-any.whl.
File metadata
- Download URL: toile-0.1.1b1-py3-none-any.whl
- Upload date:
- Size: 21.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f40244e040b659a8cf6c0a5773e3a50ff57fc79acbe0901788c4795fa27613d6
|
|
| MD5 |
e0965171f7de4ece88a1ab07c247170f
|
|
| BLAKE2b-256 |
fa09df82e98e713188df0b1b74f281e38524102d0cb102f32365233a762f3ea1
|