Fast, memory-efficient image tiling and reconstruction for deep learning and scientific computing
Project description
TileFlow
High-performance tile-based image processing for scientific computing
Process gigapixel images with minimal memory footprint through intelligent tiling and reconstruction. Designed for microscopy, whole-slide imaging, and large-scale computer vision workflows.
🚀 Key Features
- 🧠 Memory Efficient: Process images larger than RAM using intelligent tiling
- 🔬 Multi-Channel Ready: Native CHW format support for microscopy workflows
- ⚡ Zero-Copy Views: Leverages numpy slicing for maximum performance
- 🔧 Seamless Reconstruction: Intelligent overlap handling eliminates artifacts
- ☁️ Cloud-Scale: Built-in zarr integration for massive datasets
- 🎯 Pluggable Pipeline: Custom processing functions integrate seamlessly
📦 Installation
pip install tileflow
🔥 Quick Start
Basic Image Processing
from tileflow import TileFlow
import numpy as np
# Define your processing function
def enhance_contrast(tile):
"""Enhance contrast using histogram stretching."""
p2, p98 = np.percentile(tile, (2, 98))
return np.clip((tile - p2) / (p98 - p2 + 1e-8), 0, 1)
# Configure and run processor
processor = TileFlow(tile_size=(256, 256), overlap=(16, 16))
processor.configure(function=enhance_contrast)
result = processor.run(large_image)
Multi-Channel Microscopy
from tileflow import generate_multichannel_image, SobelEdgeDetector
# Generate realistic 8-channel microscopy data [C, H, W]
image_chw = generate_multichannel_image(shape=(8, 2048, 2048))
# Process DAPI channel for nuclei detection
dapi_channel = image_chw[0] # Extract nuclei channel
sobel = SobelEdgeDetector(tile_size=(256, 256), overlap=(16, 16))
nuclei_edges = sobel.process(dapi_channel)
# Apply different processing to each channel
for i, channel in enumerate(image_chw):
if i == 0: # DAPI - nuclei segmentation
processed = sobel.process(channel)
nuclei_mask = processed > np.percentile(processed, 95)
else: # Other channels - generic enhancement
processed = sobel.process(channel)
print(f"Channel {i}: {processed.max():.3f} max intensity")
Zarr for Massive Datasets
import zarr
import numpy as np
from tileflow import TileFlow
# Create zarr dataset for efficient storage
dataset = zarr.open('microscopy.zarr', mode='w',
shape=(16, 8192, 8192), chunks=(1, 1024, 1024))
# Process channels individually to manage memory
processor = TileFlow(tile_size=(512, 512), overlap=(32, 32))
processor.configure(function=your_analysis_function)
for channel_idx in range(16):
# Load single channel from zarr (memory efficient)
channel_data = np.array(dataset[channel_idx])
# Process with TileFlow
result = processor.run(channel_data)
# Save or analyze result
print(f"Channel {channel_idx} processed: {result.shape}")
🧪 Advanced Examples
Custom Multi-Channel Pipeline
class NucleiSegmentationPipeline:
"""Specialized pipeline for DAPI nuclei segmentation."""
def __init__(self, sensitivity=0.95):
self.sensitivity = sensitivity
self.processor = TileFlow(tile_size=(256, 256), overlap=(16, 16))
def segment_nuclei(self, tile):
"""Apply Sobel + thresholding for nuclei detection."""
# Sobel edge detection
gx = np.gradient(tile, axis=1)
gy = np.gradient(tile, axis=0)
edges = np.sqrt(gx*gx + gy*gy)
# Adaptive thresholding
threshold = np.percentile(edges, self.sensitivity * 100)
return (edges > threshold).astype(np.uint8)
def process(self, dapi_channel):
"""Process DAPI channel for nuclei segmentation."""
self.processor.configure(function=self.segment_nuclei)
return self.processor.run(dapi_channel)
# Use the pipeline
pipeline = NucleiSegmentationPipeline(sensitivity=0.95)
nuclei_mask = pipeline.process(image_chw[0])
print(f"Detected {nuclei_mask.sum()} nuclei pixels")
Concurrent Multi-Channel Processing
from concurrent.futures import ThreadPoolExecutor
from tileflow import SobelEdgeDetector
def process_channel_pair(args):
"""Process a single channel with appropriate algorithm."""
channel_idx, channel_data = args
if channel_idx == 0: # DAPI
processor = NucleiSegmentationPipeline()
return channel_idx, processor.process(channel_data)
else: # Other channels
sobel = SobelEdgeDetector()
return channel_idx, sobel.process(channel_data)
# Process multiple channels concurrently
with ThreadPoolExecutor(max_workers=4) as executor:
tasks = [(i, image_chw[i]) for i in range(8)]
futures = [executor.submit(process_channel_pair, task) for task in tasks]
results = {}
for future in futures:
channel_idx, result = future.result()
results[channel_idx] = result
print(f"✓ Channel {channel_idx} complete")
🎯 Use Cases
| Domain | Application | Image Size | Channels |
|---|---|---|---|
| 🔬 Microscopy | Fluorescence imaging, pathology | 2K-16K | 4-32 |
| 🧠 Deep Learning | Model inference, preprocessing | 1K-8K | 1-3 |
| 🛰️ Remote Sensing | Satellite analysis, multispectral | 4K-32K | 8-256 |
| 📱 Computer Vision | Panoramic stitching, high-res analysis | 2K-16K | 1-4 |
📊 Performance Benchmarks
| Dataset | Memory Usage | Processing Time | Zarr Compression |
|---|---|---|---|
| 2K × 2K × 8ch | 128 MB | 2.1s | 77% reduction |
| 4K × 4K × 16ch | 256 MB | 8.4s | 75% reduction |
| 8K × 8K × 32ch | 512 MB | 33.2s | 76% reduction |
| 16K × 16K × 8ch | 1.2 GB | 45.6s | 78% reduction |
Consumer hardware (16GB RAM, 8-core CPU) with Sobel edge detection
🔍 Enhanced Monitoring & Callbacks
TileFlow provides a comprehensive callback system for monitoring processing performance, memory usage, and energy consumption:
Progress & Performance Tracking
from tileflow import TileFlow, ProgressCallback, MetricsCallback
# Basic progress tracking
processor = TileFlow(tile_size=(256, 256), overlap=(32, 32))
processor.configure(function=your_function)
progress = ProgressCallback(verbose=True, show_rate=True)
metrics = MetricsCallback(verbose=True)
result = processor.run(image, callbacks=[progress, metrics])
Memory Usage Monitoring
from tileflow import MemoryTracker
# Track memory usage during processing
memory_tracker = MemoryTracker(detailed=True)
result = processor.run(large_image, callbacks=[memory_tracker])
# Get detailed statistics
memory_stats = memory_tracker.get_memory_stats()
print(f"Peak memory: {memory_stats['peak_delta_bytes'] / 1024**2:.1f} MB")
Energy Consumption Tracking
from tileflow import CodeCarbonTracker
# Track CO₂ emissions (requires: pip install codecarbon)
carbon_tracker = CodeCarbonTracker(
project_name="my-analysis",
output_dir="./carbon_logs"
)
result = processor.run(image, callbacks=[carbon_tracker])
emissions = carbon_tracker.get_emissions_data()
print(f"CO₂ emissions: {emissions['emissions_kg']:.6f} kg")
Comprehensive Monitoring Suite
from tileflow import CompositeCallback, ProgressCallback, MemoryTracker, CodeCarbonTracker
# Combine multiple monitoring callbacks
monitoring_suite = CompositeCallback([
ProgressCallback(verbose=True),
MemoryTracker(detailed=False),
CodeCarbonTracker(project_name="scientific-analysis")
])
result = processor.run(image, callbacks=[monitoring_suite])
Custom Scientific Callbacks
from tileflow import TileFlowCallback
class ImageQualityCallback(TileFlowCallback):
"""Custom callback for scientific image analysis."""
def on_tile_end(self, tile, tile_index, total_tiles):
# Analyze each processed tile
data = tile.image_data[0]
snr = np.mean(data) / np.std(data)
self.quality_metrics.append(snr)
def on_processing_end(self, stats):
print(f"Average SNR: {np.mean(self.quality_metrics):.2f}")
processor.run(image, callbacks=[ImageQualityCallback()])
📚 Complete Examples
Run comprehensive examples from the scripts/ directory:
# Basic CHW format processing
uv run python scripts/basic_usage.py
# Advanced multi-channel workflows
uv run python scripts/multichannel_processing.py
# Zarr integration for large datasets
uv run python scripts/zarr_integration.py
Example Scripts:
basic_usage.py- Interface validation with CHW arraysmultichannel_processing.py- Specialized channel processorszarr_integration.py- Cloud-scale dataset handling
🏗️ Architecture
TileFlow processes images hierarchically for optimal memory usage:
Input Image → Grid Generation → Tile Processing → Reconstruction → Output
↓ ↓ ↓ ↓ ↓
16GB Lazy 64MB Overlap 16GB
Iterator Handling
Processing Modes:
- Direct Tiling: Image → Tiles → Process → Reconstruct
- Hierarchical: Image → Chunks → Tiles → Process → Reconstruct
Core Components:
- GridSpec: Defines tiling strategy with intelligent overlap
- TileFlow: Main processor with configure/run interface
- Reconstruction: Seamless merge with artifact elimination
- Backends: Support for numpy, zarr, and custom data sources
🛠️ Development
# Clone and setup
git clone <repository>
cd TileFlow
uv sync
# Run tests
uv run pytest
# Code quality
uv run ruff check
uv run ruff format
uv run mypy src/tileflow
# Build package
uv build
🧬 Scientific Applications
Fluorescence Microscopy:
# Multi-fluorophore analysis
channels = ["DAPI", "FITC", "TRITC", "Cy5"]
for i, name in enumerate(channels):
channel_data = image_chw[i]
processed = specialized_processor[name].process(channel_data)
analyze_fluorophore_distribution(processed, name)
Whole-Slide Pathology:
# Process gigapixel pathology slides
wsi_processor = TileFlow(
tile_size=(512, 512),
overlap=(64, 64),
chunk_size=(2048, 2048) # Hierarchical processing
)
wsi_processor.configure(function=tissue_classifier)
classification_map = wsi_processor.run(whole_slide_image)
Satellite Imagery:
# Multispectral satellite analysis
spectral_bands = ["red", "green", "blue", "nir", "swir1", "swir2"]
for band_idx, band_name in enumerate(spectral_bands):
band_data = satellite_image[band_idx]
vegetation_index = calculate_ndvi(band_data)
🤝 Contributing
TileFlow is built for the scientific computing community. We welcome contributions for:
- New Backends: TIFF, HDF5, cloud storage adapters
- Processing Algorithms: Segmentation, enhancement, feature extraction
- Performance: GPU acceleration, distributed processing
- Documentation: Tutorials, use case examples
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
Valentin Poque - Core development and architecture (July-September 2025)
Process any image, any size, any channel count.
TileFlow scales with your data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tileflow-0.5.0.tar.gz.
File metadata
- Download URL: tileflow-0.5.0.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56f450b2a6043a189aa1f57753d28698c15ca68746116c9819755ed67e9da4f2
|
|
| MD5 |
a20b71680f5434db6be94e38846a450a
|
|
| BLAKE2b-256 |
4ab2b08cafb8f2764f5810fecd60355dd49906d43b368f56e28cd5961309885c
|
File details
Details for the file tileflow-0.5.0-py3-none-any.whl.
File metadata
- Download URL: tileflow-0.5.0-py3-none-any.whl
- Upload date:
- Size: 29.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b8f6402b2ef0790271c3287e46a987895098f9a540d859d105e2a247c0fae99
|
|
| MD5 |
eee921a8962e0767a74e40c79db2e9b6
|
|
| BLAKE2b-256 |
4d20c51dfe2e1ab42dfc2e77a6c48ccb854d2cafa3361792a0296bb99cca115d
|