Skip to main content

High-performance TDC-only TPX3 neutron imaging data processor

Project description

TDCSophiread

High-performance Python and C++ library for processing TPX3 neutron imaging data with 96M+ hits/sec throughput. TDCSophiread provides complete hit extraction and neutron clustering capabilities using TDC-only timing (detector-expert approved).

🚀 Key Features

  • 🏃 High Performance: 96M+ hits/sec with Intel TBB parallel processing
  • 🧠 Smart Clustering: 4 algorithms (ABS, Graph, DBSCAN, Grid) for neutron event reconstruction
  • ⚡ Zero-Copy Processing: Memory-efficient temporal batching with structured numpy arrays
  • 🔍 TDC-Only Timing: Detector-expert-approved approach (no unreliable GDC)
  • 🐍 Python Integration: Complete Python API with Jupyter notebook examples
  • 📊 Production Ready: Real-world performance validated on 12GB datasets

Quick Start

Installation

# Clone repository
git clone https://github.com/ornlneutronimaging/mcpevent2hist.git
cd mcpevent2hist/sophiread

# Set up environment (pixi recommended)
pixi install

# Build and install
pixi run dev-install

Note: if you prefer to do build in a staged manner, you can issue the following commands:

# configure with CMake
pixi run configure
# build with CMake
pixi run build
# run tests with CMake
pixi run test
# install Python bindings
pixi run pip install -e . --no-build-isolation

Get Sample Data (12GB)

# Make sure you have git lfs installed, then run:
git lfs install
# Initialize the git submodule
git submodule init
# Download real TPX3 datasets for testing
git submodule update --init resources/sophiread_data

Python Usage

import tdcsophiread

# 1. Extract hits from TPX3 file
hits = tdcsophiread.process_tpx3("data.tpx3", parallel=True)
print(f"Extracted {len(hits):,} hits")

# 2. Process hits to neutrons using clustering
neutrons = tdcsophiread.process_hits_to_neutrons(hits)
print(f"Found {len(neutrons):,} neutrons")

# 3. Try different clustering algorithms
config = tdcsophiread.NeutronProcessingConfig.venus_defaults()
config.clustering.algorithm = "dbscan"  # or "abs", "graph", "grid"
neutrons = tdcsophiread.process_hits_to_neutrons(hits, config)

Performance Monitoring

# Get detailed performance statistics
config = tdcsophiread.NeutronProcessingConfig.venus_defaults()
processor = tdcsophiread.TemporalNeutronProcessor(config)
neutrons = processor.processHits(hits)

stats = processor.getStatistics()
print(f"Hit rate: {stats.hits_per_second/1e6:.1f} M hits/sec")
print(f"Neutron efficiency: {stats.neutron_efficiency:.3f}")
print(f"Parallel efficiency: {stats.parallel_efficiency:.2f}")

🧬 Architecture

TDCSophiread implements a modern, high-performance pipeline with parallel temporal processing:

flowchart TD
    A[TPX3 Raw Data] --> B[TDCProcessor]
    B --> |Memory-mapped I/O<br/>Section-aware processing| C[std::vector&lt;TDCHit&gt;<br/>Temporally ordered hits]

    C --> D[TemporalNeutronProcessor]

    subgraph TemporalNeutronProcessor
        direction TB
        E[Phase 1: Statistical Analysis<br/>• Analyze hit distribution<br/>• Calculate optimal batch sizes<br/>• Determine overlaps]

        E --> F[Phase 2: Parallel Worker Pool]

        subgraph ParallelWorkerPool
            direction LR
            Worker0[Worker 0]
            Worker1[Worker 1]
            WorkerN[Worker N]
        end

        subgraph Worker0Details
            direction TB
            G1[Hit Clustering<br/>Algorithm Selection]
            G1 --> G1a["ABS<br/>O(n) - Fastest"]
            G1 --> G1b["Graph<br/>O(n log n) - Balanced"]
            G1 --> G1c["DBSCAN<br/>O(n log n) - Noise handling"]
            G1 --> G1d["Grid<br/>O(n) - Geometry optimized"]
            G1a --> G2[Neutron Extraction<br/>TOT-weighted centroids]
            G1b --> G2
            G1c --> G2
            G1d --> G2
        end

        subgraph Worker1Details
            direction TB
            H1[Hit Clustering] --> H2[Neutron Extraction]
        end

        subgraph WorkerNDetails
            direction TB
            I1[Hit Clustering] --> I2[Neutron Extraction]
        end

        Worker0 --> Worker0Details
        Worker1 --> Worker1Details
        WorkerN --> WorkerNDetails

        F --> J[Phase 3: Result Aggregation<br/>• Combine worker results<br/>• Remove overlap duplicates<br/>• Generate statistics]
    end

    J --> K[std::vector&lt;TDCNeutron&gt;<br/>Final neutron events<br/>96M+ hits/sec performance]

    style A fill:#e1f5fe
    style K fill:#e8f5e8
    style TemporalNeutronProcessor fill:#f3e5f5
    style G1a fill:#ffecb3
    style G1b fill:#fff3e0
    style G1c fill:#fce4ec
    style G1d fill:#e0f2f1

Phase 1: Hit Extraction

  • Memory-mapped I/O: Efficient processing of large TPX3 files
  • Section-aware processing: Respects TPX3 data structure constraints
  • TDC state propagation: Sequential processing for reliable timing
  • Parallel chunk processing: Intel TBB for maximum throughput

Phase 2: Temporal Neutron Processing

  • Statistical analysis: Optimal batching based on hit distribution
  • Parallel worker pool: Each worker has dedicated algorithm instances
  • 4 clustering algorithms: ABS, Graph, DBSCAN, Grid with different performance characteristics
  • Zero-copy processing: Iterator-based interfaces minimize memory overhead

Phase 3: Result Aggregation

  • Parallel result combination: Efficient merging from multiple workers
  • Overlap deduplication: Remove duplicate neutrons from batch boundaries
  • Performance statistics: Detailed metrics for optimization

🎯 Clustering Algorithms

Algorithm Performance Use Case Complexity
ABS Fastest General purpose, high throughput O(n)
Graph Fast Balanced speed/accuracy O(n log n)
DBSCAN Medium Noise handling, complex patterns O(n log n)
Grid Fast Detector geometry optimization O(n)

Algorithm Configuration

config = tdcsophiread.NeutronProcessingConfig.venus_defaults()

# ABS (Adaptive Bucket Sort) - Fastest
config.clustering.algorithm = "abs"
config.clustering.abs.radius = 5.0
config.clustering.abs.neutron_correlation_window = 75.0  # nanoseconds

# DBSCAN - Best noise handling
config.clustering.algorithm = "dbscan"
config.clustering.dbscan.epsilon = 4.0
config.clustering.dbscan.min_points = 3

# Process with custom configuration
neutrons = tdcsophiread.process_hits_to_neutrons(hits, config)

📊 Performance

Measured Performance (Real Hardware)

System Hit Rate Clustering Notes
M2 Max 20M+ hits/sec ABS Development system
AMD EPYC 9174F 96M+ hits/sec ABS Production target
Memory Usage ~40-60 bytes/hit All Including clustering

Performance by File Size

  • < 100MB: 20-40 M hits/sec (single-threaded sufficient)
  • 100MB-1GB: 50-80 M hits/sec (parallel recommended)
  • 1GB-10GB: 80-96 M hits/sec (optimal parallel)
  • > 10GB: 90-96 M hits/sec (streaming mode)

🔧 Build System

Development Workflow

# Core workflow
pixi run build        # Configure and build C++
pixi run test         # Run C++ tests
pixi run install      # Install Python bindings (editable)
pixi run python-test  # Test Python import

# Data setup (12GB sample data)
pixi run setup-data   # Download sample TPX3 files
pixi run notebooks    # Launch Jupyter with notebooks

Build Options

# Start the subprocess with pixi
pixi shell

# Debug build (if needed)
cmake -B build -DCMAKE_BUILD_TYPE=Debug

# Legacy components (not recommended)
cmake -B build -DBUILD_LEGACY=ON

⚠️ Legacy Warning: Legacy components use unreliable GDC timing and will be removed in the next major release.

📚 Documentation & Examples

Jupyter Notebooks (Real Data)

# Start Jupyter with sample notebooks
pixi run notebooks

Available Notebooks:

  • notebooks/hits_extraction_from_tpx3_Ni.ipynb - Hit extraction (96M+ hits/sec)
  • notebooks/neutrons_extraction_from_tpx3_Ni.ipynb - Complete neutron processing
  • notebooks/clustering_abs_ni.ipynb - ABS clustering demo
  • notebooks/clustering_graph_ni.ipynb - Graph clustering demo
  • notebooks/clustering_dbscan_Ni.ipynb - DBSCAN clustering demo
  • notebooks/clustering_grid_Ni.ipynb - Grid clustering demo

Documentation

🗂️ Data Format

Hit Data (Structured NumPy Array)

hits = tdcsophiread.process_tpx3("data.tpx3")
print(f"Fields: {hits.dtype.names}")
# ('tof', 'x', 'y', 'timestamp', 'tot', 'chip_id', 'cluster_id')

# Access hit properties
x_coords = hits['x']          # Global X coordinates (uint16)
y_coords = hits['y']          # Global Y coordinates (uint16)
tof_values = hits['tof']      # Time-of-flight (uint32, 25ns units)
tot_values = hits['tot']      # Time-over-threshold (uint16)
chip_ids = hits['chip_id']    # Chip ID 0-3 (uint8)

Neutron Data (Structured NumPy Array)

neutrons = tdcsophiread.process_hits_to_neutrons(hits)
print(f"Fields: {neutrons.dtype.names}")
# ('x', 'y', 'tof', 'tot', 'n_hits', 'chip_id', 'reserved')

# Access neutron properties
x_subpixel = neutrons['x']     # Sub-pixel X coordinates (float64)
y_subpixel = neutrons['y']     # Sub-pixel Y coordinates (float64)
tof_neutron = neutrons['tof']  # Representative TOF (uint32, 25ns units)
cluster_size = neutrons['n_hits'] # Number of hits in cluster (uint16)

Unit Conversions

# Time conversions
tof_ms = hits['tof'] * 25 / 1e6        # 25ns units → milliseconds
timestamp_s = hits['timestamp'] * 25 / 1e9  # 25ns units → seconds

# Coordinate conversions
pixel_x = neutrons['x'] / 8.0          # Sub-pixel → pixel (factor=8)
pixel_y = neutrons['y'] / 8.0

⚙️ Configuration

JSON Configuration

{
  "clustering": {
    "algorithm": "abs",
    "abs": {
      "radius": 5.0,
      "neutron_correlation_window": 75.0
    }
  },
  "extraction": {
    "algorithm": "simple_centroid",
    "super_resolution_factor": 8.0,
    "weighted_by_tot": true
  },
  "temporal": {
    "num_workers": 0,
    "max_batch_size": 100000
  }
}

Detector Configuration

{
  "detector": {
    "timing": {
      "tdc_frequency_hz": 60.0,
      "enable_missing_tdc_correction": true
    },
    "chip_layout": {
      "chip_size_x": 256,
      "chip_size_y": 256
    }
  }
}

🔬 Scientific Context

TPX3 Data Constraints

TDCSophiread respects the physical constraints of TPX3 data:

  • Variable section sizes: No padding or fixed boundaries
  • Local time disorder: Packets within sections not time-ordered
  • Missing TDC packets: Hardware may drop TDC packets (corrected automatically)
  • Sequential dependencies: TDC state must propagate in order

🛠️ Development

Requirements

  • C++20 compiler (GCC 10+, Clang 11+, MSVC 2019+)
  • Intel TBB for parallel processing
  • HDF5 for data I/O
  • Python 3.8+ with NumPy
  • CMake 3.20+

Environment Setup

# Install pixi (cross-platform package manager)
curl -sSL https://pixi.sh/install | bash

# Clone and setup
git clone https://github.com/ornlneutronimaging/mcpevent2hist.git
cd mcpevent2hist/sophiread
pixi install

Code Style

  • C++20 with modern practices
  • Google C++ Style (2-space indentation)
  • Test-Driven Development with Google Test
  • Zero-copy design patterns
  • Stateless algorithms for parallelization

🔗 Legacy Components

Previous implementations (FastSophiread, CLI/GUI applications) have been moved to legacy/ and are deprecated:

  • Unreliable GDC timing (disapproved by detector experts)
  • Template complexity (hard to maintain)

Migration: All legacy functionality is available in TDCSophiread with improved performance and reliability.

📈 Benchmarks

Real-World Performance

Using sample data from notebooks/data/:

# Ni powder diffraction data (>1M hits)
sample_file = "notebooks/data/Run_8217_April25_2025_Ni_Powder_MCP_TPX3_0_8C_1_9_AngsMin_serval_000000.tpx3"

import time
start = time.time()
hits = tdcsophiread.process_tpx3(sample_file, parallel=True)
neutrons = tdcsophiread.process_hits_to_neutrons(hits)
elapsed = time.time() - start

print(f"Performance: {len(hits) / elapsed / 1e6:.1f} M hits/sec")
print(f"Found {len(neutrons):,} neutrons from {len(hits):,} hits")

Memory Efficiency

  • Before optimization: 48GB peak memory
  • After optimization: 20GB peak memory (58% reduction)
  • Current streaming: 512MB chunks for any file size

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

Issue Reporting

📄 License

GPL-3.0+ License - see LICENSE file for details.


Ready to process neutron data at 96M+ hits/sec? 🚀

Get started: docs/quickstart.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdcsophiread-3.1.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tdcsophiread-3.1.0-cp314-cp314-manylinux_2_28_x86_64.whl (527.5 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

tdcsophiread-3.1.0-cp313-cp313-manylinux_2_28_x86_64.whl (527.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

tdcsophiread-3.1.0-cp312-cp312-manylinux_2_28_x86_64.whl (526.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

tdcsophiread-3.1.0-cp311-cp311-manylinux_2_28_x86_64.whl (527.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

tdcsophiread-3.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (526.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file tdcsophiread-3.1.0.tar.gz.

File metadata

  • Download URL: tdcsophiread-3.1.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tdcsophiread-3.1.0.tar.gz
Algorithm Hash digest
SHA256 ed9c201ccae9329723f1c013fecd49328866d5a66ddb23f3cbdb7f7870014bd2
MD5 6dd066fa3696f5bb396b6780e5403210
BLAKE2b-256 1fa348eda1b10eb8a331f6b8d6f53f808e6f345bacaec40a29df03e6c82e21e4

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.0-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tdcsophiread-3.1.0-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3db35900637887bec8bf6187216dba7f6c4bec7e99b7ff03a0976ebba0deaa35
MD5 e0f7df3949659391522d84ee2f8858f9
BLAKE2b-256 2aa2f71e5947ac03dc6a7b4721870f65ff1c90b6ddd6f62bdcc85c65f6837a76

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.0-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tdcsophiread-3.1.0-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 61833e9d38394e9ff4333a3b55ccddf598006425dfb5300c6338b5d1a6fb0947
MD5 92d4552bc7cf67ecfba0d8d5e1bf10ca
BLAKE2b-256 e4a83ad703dc1fc16c2ef3f9add1760e7da8c87258605fbeaf7ca6353916e6a8

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tdcsophiread-3.1.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 97c669ebf450df4faf7a77a0c6f38ed48802d19644e445e3fe65611524c71be1
MD5 7d34672d13a7527e2dce89cc5d80f7aa
BLAKE2b-256 ba0d5cd77c08cd9dc0ac44f135483d3716e95e591cc3efaa4f1a49433d0425d1

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tdcsophiread-3.1.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2fdb6e4ca6620ada61759d42aa77c4f2e971e449e6283f4fe8f415194e69559d
MD5 7334bec8c84eda67b15f86443521fa4d
BLAKE2b-256 599aee0d1c857f1e3936cc2d836455ae96b59f810c629a692f3e4cc74b51d7e9

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.0-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for tdcsophiread-3.1.0-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a73874e8c4bf7239c713fbbf9ba8177d4011fa6e6be5f9b10221209fa0e982f9
MD5 44b860a2ff776443e0b354ca8b923f86
BLAKE2b-256 4851d6f6e4883249b7ffb35ff08add46937f3d54a53b75aded3303996e3bdff4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page