High-performance TDC-only TPX3 neutron imaging data processor
Project description
TDCSophiread
High-performance Python and C++ library for processing TPX3 neutron imaging data with 96M+ hits/sec throughput. TDCSophiread provides complete hit extraction and neutron clustering capabilities using TDC-only timing (detector-expert approved).
🚀 Key Features
- 🏃 High Performance: 96M+ hits/sec with Intel TBB parallel processing
- 🧠 Smart Clustering: 4 algorithms (ABS, Graph, DBSCAN, Grid) for neutron event reconstruction
- ⚡ Zero-Copy Processing: Memory-efficient temporal batching with structured numpy arrays
- 🔍 TDC-Only Timing: Detector-expert-approved approach (no unreliable GDC)
- 🐍 Python Integration: Complete Python API with Jupyter notebook examples
- 📊 Production Ready: Real-world performance validated on 12GB datasets
Quick Start
Installation
# Clone repository
git clone https://github.com/ornlneutronimaging/mcpevent2hist.git
cd mcpevent2hist/sophiread
# Set up environment (pixi recommended)
pixi install && pixi shell
# Build and install
pixi run cmake -B build && pixi run cmake --build build && pip install -e .
Get Sample Data (12GB)
# Download real TPX3 datasets for testing
git submodule update --init notebooks/data
Python Usage
import tdcsophiread
# 1. Extract hits from TPX3 file
hits = tdcsophiread.process_tpx3("data.tpx3", parallel=True)
print(f"Extracted {len(hits):,} hits")
# 2. Process hits to neutrons using clustering
neutrons = tdcsophiread.process_hits_to_neutrons(hits)
print(f"Found {len(neutrons):,} neutrons")
# 3. Try different clustering algorithms
config = tdcsophiread.NeutronProcessingConfig.venus_defaults()
config.clustering.algorithm = "dbscan" # or "abs", "graph", "grid"
neutrons = tdcsophiread.process_hits_to_neutrons(hits, config)
Performance Monitoring
# Get detailed performance statistics
config = tdcsophiread.NeutronProcessingConfig.venus_defaults()
processor = tdcsophiread.TemporalNeutronProcessor(config)
neutrons = processor.processHits(hits)
stats = processor.getStatistics()
print(f"Hit rate: {stats.hits_per_second/1e6:.1f} M hits/sec")
print(f"Neutron efficiency: {stats.neutron_efficiency:.3f}")
print(f"Parallel efficiency: {stats.parallel_efficiency:.2f}")
🧬 Architecture
TDCSophiread implements a modern, high-performance pipeline with parallel temporal processing:
flowchart TD
A[TPX3 Raw Data] --> B[TDCProcessor]
B --> |Memory-mapped I/O<br/>Section-aware processing| C[std::vector<TDCHit><br/>Temporally ordered hits]
C --> D[TemporalNeutronProcessor]
subgraph TemporalNeutronProcessor
direction TB
E[Phase 1: Statistical Analysis<br/>• Analyze hit distribution<br/>• Calculate optimal batch sizes<br/>• Determine overlaps]
E --> F[Phase 2: Parallel Worker Pool]
subgraph ParallelWorkerPool
direction LR
Worker0[Worker 0]
Worker1[Worker 1]
WorkerN[Worker N]
end
subgraph Worker0Details
direction TB
G1[Hit Clustering<br/>Algorithm Selection]
G1 --> G1a["ABS<br/>O(n) - Fastest"]
G1 --> G1b["Graph<br/>O(n log n) - Balanced"]
G1 --> G1c["DBSCAN<br/>O(n log n) - Noise handling"]
G1 --> G1d["Grid<br/>O(n) - Geometry optimized"]
G1a --> G2[Neutron Extraction<br/>TOT-weighted centroids]
G1b --> G2
G1c --> G2
G1d --> G2
end
subgraph Worker1Details
direction TB
H1[Hit Clustering] --> H2[Neutron Extraction]
end
subgraph WorkerNDetails
direction TB
I1[Hit Clustering] --> I2[Neutron Extraction]
end
Worker0 --> Worker0Details
Worker1 --> Worker1Details
WorkerN --> WorkerNDetails
F --> J[Phase 3: Result Aggregation<br/>• Combine worker results<br/>• Remove overlap duplicates<br/>• Generate statistics]
end
J --> K[std::vector<TDCNeutron><br/>Final neutron events<br/>96M+ hits/sec performance]
style A fill:#e1f5fe
style K fill:#e8f5e8
style TemporalNeutronProcessor fill:#f3e5f5
style G1a fill:#ffecb3
style G1b fill:#fff3e0
style G1c fill:#fce4ec
style G1d fill:#e0f2f1
Phase 1: Hit Extraction
- Memory-mapped I/O: Efficient processing of large TPX3 files
- Section-aware processing: Respects TPX3 data structure constraints
- TDC state propagation: Sequential processing for reliable timing
- Parallel chunk processing: Intel TBB for maximum throughput
Phase 2: Temporal Neutron Processing
- Statistical analysis: Optimal batching based on hit distribution
- Parallel worker pool: Each worker has dedicated algorithm instances
- 4 clustering algorithms: ABS, Graph, DBSCAN, Grid with different performance characteristics
- Zero-copy processing: Iterator-based interfaces minimize memory overhead
Phase 3: Result Aggregation
- Parallel result combination: Efficient merging from multiple workers
- Overlap deduplication: Remove duplicate neutrons from batch boundaries
- Performance statistics: Detailed metrics for optimization
🎯 Clustering Algorithms
| Algorithm | Performance | Use Case | Complexity |
|---|---|---|---|
| ABS | Fastest | General purpose, high throughput | O(n) |
| Graph | Fast | Balanced speed/accuracy | O(n log n) |
| DBSCAN | Medium | Noise handling, complex patterns | O(n log n) |
| Grid | Fast | Detector geometry optimization | O(n) |
Algorithm Configuration
config = tdcsophiread.NeutronProcessingConfig.venus_defaults()
# ABS (Adaptive Bucket Sort) - Fastest
config.clustering.algorithm = "abs"
config.clustering.abs.radius = 5.0
config.clustering.abs.neutron_correlation_window = 75.0 # nanoseconds
# DBSCAN - Best noise handling
config.clustering.algorithm = "dbscan"
config.clustering.dbscan.epsilon = 4.0
config.clustering.dbscan.min_points = 3
# Process with custom configuration
neutrons = tdcsophiread.process_hits_to_neutrons(hits, config)
📊 Performance
Measured Performance (Real Hardware)
| System | Hit Rate | Clustering | Notes |
|---|---|---|---|
| M2 Max | 20M+ hits/sec | ABS | Development system |
| AMD EPYC 9174F | 96M+ hits/sec | ABS | Production target |
| Memory Usage | ~40-60 bytes/hit | All | Including clustering |
Performance by File Size
- < 100MB: 20-40 M hits/sec (single-threaded sufficient)
- 100MB-1GB: 50-80 M hits/sec (parallel recommended)
- 1GB-10GB: 80-96 M hits/sec (optimal parallel)
- > 10GB: 90-96 M hits/sec (streaming mode)
🔧 Build System
Development Workflow
# Core workflow
pixi run build # Configure and build C++
pixi run test # Run C++ tests
pixi run install # Install Python bindings (editable)
pixi run python-test # Test Python import
# Data setup (12GB sample data)
pixi run setup-data # Download sample TPX3 files
pixi run notebooks # Launch Jupyter with notebooks
Build Options
# Debug build (if needed)
cmake -B build -DCMAKE_BUILD_TYPE=Debug
# Legacy components (not recommended)
cmake -B build -DBUILD_LEGACY=ON
⚠️ Legacy Warning: Legacy components use unreliable GDC timing and will be removed in the next major release.
📚 Documentation & Examples
Jupyter Notebooks (Real Data)
# Start Jupyter with sample notebooks
pixi run notebooks
Available Notebooks:
notebooks/hits_extraction_from_tpx3_Ni.ipynb- Hit extraction (96M+ hits/sec)notebooks/neutrons_extraction_from_tpx3_Ni.ipynb- Complete neutron processingnotebooks/clustering_abs_ni.ipynb- ABS clustering demonotebooks/clustering_graph_ni.ipynb- Graph clustering demonotebooks/clustering_dbscan_Ni.ipynb- DBSCAN clustering demonotebooks/clustering_grid_Ni.ipynb- Grid clustering demo
Documentation
- 📖 Quick Start:
docs/quickstart.md - 📋 API Reference:
docs/api_reference.md - 🏗️ Architecture:
TDCSOPHIREAD_ARCHITECTURE_2025.md - 🧬 TPX3 Format:
TPX3.md
🗂️ Data Format
Hit Data (Structured NumPy Array)
hits = tdcsophiread.process_tpx3("data.tpx3")
print(f"Fields: {hits.dtype.names}")
# ('tof', 'x', 'y', 'timestamp', 'tot', 'chip_id', 'cluster_id')
# Access hit properties
x_coords = hits['x'] # Global X coordinates (uint16)
y_coords = hits['y'] # Global Y coordinates (uint16)
tof_values = hits['tof'] # Time-of-flight (uint32, 25ns units)
tot_values = hits['tot'] # Time-over-threshold (uint16)
chip_ids = hits['chip_id'] # Chip ID 0-3 (uint8)
Neutron Data (Structured NumPy Array)
neutrons = tdcsophiread.process_hits_to_neutrons(hits)
print(f"Fields: {neutrons.dtype.names}")
# ('x', 'y', 'tof', 'tot', 'n_hits', 'chip_id', 'reserved')
# Access neutron properties
x_subpixel = neutrons['x'] # Sub-pixel X coordinates (float64)
y_subpixel = neutrons['y'] # Sub-pixel Y coordinates (float64)
tof_neutron = neutrons['tof'] # Representative TOF (uint32, 25ns units)
cluster_size = neutrons['n_hits'] # Number of hits in cluster (uint16)
Unit Conversions
# Time conversions
tof_ms = hits['tof'] * 25 / 1e6 # 25ns units → milliseconds
timestamp_s = hits['timestamp'] * 25 / 1e9 # 25ns units → seconds
# Coordinate conversions
pixel_x = neutrons['x'] / 8.0 # Sub-pixel → pixel (factor=8)
pixel_y = neutrons['y'] / 8.0
⚙️ Configuration
JSON Configuration
{
"clustering": {
"algorithm": "abs",
"abs": {
"radius": 5.0,
"neutron_correlation_window": 75.0
}
},
"extraction": {
"algorithm": "simple_centroid",
"super_resolution_factor": 8.0,
"weighted_by_tot": true
},
"temporal": {
"num_workers": 0,
"max_batch_size": 100000
}
}
Detector Configuration
{
"detector": {
"timing": {
"tdc_frequency_hz": 60.0,
"enable_missing_tdc_correction": true
},
"chip_layout": {
"chip_size_x": 256,
"chip_size_y": 256
}
}
}
🔬 Scientific Context
TPX3 Data Constraints
TDCSophiread respects the physical constraints of TPX3 data:
- Variable section sizes: No padding or fixed boundaries
- Local time disorder: Packets within sections not time-ordered
- Missing TDC packets: Hardware may drop TDC packets (corrected automatically)
- Sequential dependencies: TDC state must propagate in order
🛠️ Development
Requirements
- C++20 compiler (GCC 10+, Clang 11+, MSVC 2019+)
- Intel TBB for parallel processing
- HDF5 for data I/O
- Python 3.8+ with NumPy
- CMake 3.20+
Environment Setup
# Install pixi (cross-platform package manager)
curl -sSL https://pixi.sh/install | bash
# Clone and setup
git clone https://github.com/ornlneutronimaging/mcpevent2hist.git
cd mcpevent2hist/sophiread
pixi install
Code Style
- C++20 with modern practices
- Google C++ Style (2-space indentation)
- Test-Driven Development with Google Test
- Zero-copy design patterns
- Stateless algorithms for parallelization
🔗 Legacy Components
Previous implementations (FastSophiread, CLI/GUI applications) have been moved to legacy/ and are deprecated:
- ❌ Unreliable GDC timing (disapproved by detector experts)
- ❌ Template complexity (hard to maintain)
Migration: All legacy functionality is available in TDCSophiread with improved performance and reliability.
📈 Benchmarks
Real-World Performance
Using sample data from notebooks/data/:
# Ni powder diffraction data (>1M hits)
sample_file = "notebooks/data/Run_8217_April25_2025_Ni_Powder_MCP_TPX3_0_8C_1_9_AngsMin_serval_000000.tpx3"
import time
start = time.time()
hits = tdcsophiread.process_tpx3(sample_file, parallel=True)
neutrons = tdcsophiread.process_hits_to_neutrons(hits)
elapsed = time.time() - start
print(f"Performance: {len(hits) / elapsed / 1e6:.1f} M hits/sec")
print(f"Found {len(neutrons):,} neutrons from {len(hits):,} hits")
Memory Efficiency
- Before optimization: 48GB peak memory
- After optimization: 20GB peak memory (58% reduction)
- Current streaming: 512MB chunks for any file size
🤝 Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
Issue Reporting
- 🐛 Bugs: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📧 Contact: neutronimaging@ornl.gov
📄 License
GPL-3.0+ License - see LICENSE file for details.
Ready to process neutron data at 96M+ hits/sec? 🚀
Get started: docs/quickstart.md
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tdcsophiread-3.0.0.tar.gz.
File metadata
- Download URL: tdcsophiread-3.0.0.tar.gz
- Upload date:
- Size: 232.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2d1451eb5ae7632e7e91d4db4ebc563e3fe6d398c57c6bb57fc0c87e175847b
|
|
| MD5 |
13147489fae579b9ba5c3b3f8d9b382f
|
|
| BLAKE2b-256 |
b7eaea83f225adf63c3dc39a7e915b4576ce20a5c0977393a30fd2037095c47b
|
File details
Details for the file tdcsophiread-3.0.0-cp312-abi3-manylinux_2_39_x86_64.whl.
File metadata
- Download URL: tdcsophiread-3.0.0-cp312-abi3-manylinux_2_39_x86_64.whl
- Upload date:
- Size: 548.5 kB
- Tags: CPython 3.12+, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2de0a42d0be4fd00cee314b92cc1fb1ae8d8af6e1087dcd29b83fd24829b4df
|
|
| MD5 |
853b6695d521aa0b61dd8a3037f24241
|
|
| BLAKE2b-256 |
40e7945a3577de4cbe957ea09356baecabe8b0b92bab6741c9ea4872787328cd
|
File details
Details for the file tdcsophiread-3.0.0-cp312-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: tdcsophiread-3.0.0-cp312-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 526.2 kB
- Tags: CPython 3.12+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5453a570cda898bd48c4b31334db0964857518c3567cf0d913b3b28d69f1e2c
|
|
| MD5 |
d5357d0ca31742c5e7b9e977d3d49df6
|
|
| BLAKE2b-256 |
794615a02dc22a1cd71665786f805dd7776f53f7e479bf296922b35f5bb1c8b1
|
File details
Details for the file tdcsophiread-3.0.0-cp312-abi3-macosx_15_0_arm64.whl.
File metadata
- Download URL: tdcsophiread-3.0.0-cp312-abi3-macosx_15_0_arm64.whl
- Upload date:
- Size: 6.4 MB
- Tags: CPython 3.12+, macOS 15.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1fd36ca1e295b85aa96d9bc19afa476494a4611d812886038569997530c89f3
|
|
| MD5 |
bab46f36cbfdbdb998468e0215b68c4f
|
|
| BLAKE2b-256 |
c97ea11394bbf306a1a6f8a54e72b78980e52e78d747c51a163b4bce99e6f03b
|