Skip to main content

High-performance TDC-only TPX3 neutron imaging data processor

Project description

TDCSophiread

High-performance Python and C++ library for processing TPX3 neutron imaging data with 96M+ hits/sec throughput. TDCSophiread provides complete hit extraction and neutron clustering capabilities using TDC-only timing (detector-expert approved).

🚀 Key Features

  • 🏃 High Performance: 96M+ hits/sec with Intel TBB parallel processing
  • 🧠 Smart Clustering: 4 algorithms (ABS, Graph, DBSCAN, Grid) for neutron event reconstruction
  • ⚡ Zero-Copy Processing: Memory-efficient temporal batching with structured numpy arrays
  • 🔍 TDC-Only Timing: Detector-expert-approved approach (no unreliable GDC)
  • 🐍 Python Integration: Complete Python API with Jupyter notebook examples
  • 📊 Production Ready: Real-world performance validated on 12GB datasets

Quick Start

Installation

Option 1: Install from PyPI (Recommended)

For most users, install the pre-built wheels:

# Using pip (works with any environment)
pip install tdcsophiread

# Using uv (modern package manager)
uv pip install tdcsophiread

# Using pixi (recommended for scientific computing)
pixi add pip
pixi run pip install tdcsophiread

⚠️ Known Issue with pixi add --pypi

When installing tdcsophiread with pixi add --pypi, pixi's embedded uv incorrectly attempts to build from source instead of using the available pre-built wheel. This causes installation failure due to missing C++ build dependencies.

Workaround: Use pixi run pip install tdcsophiread (recommended) or native uv pip install - both correctly use the wheel.

Testing shows:

  • pip install tdcsophiread ✅ Uses wheel
  • uv pip install tdcsophiread ✅ Uses wheel
  • pixi run pip install tdcsophiread ✅ Uses wheel
  • pixi add --pypi tdcsophiread ❌ Builds from source (fails)

Status: This is a pixi-specific issue. The wheel works correctly with pip and uv natively. The root cause is unknown and may be specific to tdcsophiread's package configuration.

💡 Building from Source with pixi: If you need to build from source using pixi add --pypi (not recommended), see the "Building from Source" section below for required dependencies.

Option 2: Development Installation

For development or if you need the latest features:

# Clone repository
git clone https://github.com/ornlneutronimaging/mcpevent2hist.git
cd mcpevent2hist/sophiread

# Set up environment (pixi recommended)
pixi install

# Build and install
pixi run dev-install

Note: if you prefer to do build in a staged manner, you can issue the following commands:

# configure with CMake
pixi run configure
# build with CMake
pixi run build
# run tests with CMake
pixi run test
# install Python bindings
pixi run pip install -e . --no-build-isolation

Building from Source

If building from source (e.g., when no pre-built wheel is available for your platform), you need these C++ libraries:

Note: These C++ libraries are NOT available on PyPI. They must be installed through pixi/conda or your system package manager before building from source.

Using pixi (recommended):

# Install build dependencies first
pixi add nlohmann_json spdlog eigen hdf5 tbb-devel libtiff fmt pybind11

# Option A: Build from source with pip (recommended)
pixi add pip
pixi run pip install tdcsophiread --no-binary tdcsophiread

# Option B: Build from source with pixi add --pypi (if needed)
# Note: This triggers build automatically due to the pixi bug
pixi add --pypi tdcsophiread

Using system packages:

# RHEL/Rocky/AlmaLinux/Fedora
sudo dnf install nlohmann-json-devel spdlog-devel eigen3-devel \
                 hdf5-devel tbb-devel libtiff-devel fmt-devel \
                 pybind11-devel cmake gcc-c++

# Ubuntu/Debian
sudo apt install nlohmann-json3-dev libspdlog-dev libeigen3-dev \
                 libhdf5-dev libtbb-dev libtiff-dev libfmt-dev \
                 pybind11-dev cmake g++

# Then install with pip
pip install tdcsophiread --no-binary tdcsophiread

Get Sample Data (12GB)

# Make sure you have git lfs installed, then run:
git lfs install
# Initialize the git submodule
git submodule init
# Download real TPX3 datasets for testing
git submodule update --init resources/sophiread_data

Python Usage

import tdcsophiread

# 1. Extract hits from TPX3 file
hits = tdcsophiread.process_tpx3("data.tpx3", parallel=True)
print(f"Extracted {len(hits):,} hits")

# 2. Process hits to neutrons using clustering
neutrons = tdcsophiread.process_hits_to_neutrons(hits)
print(f"Found {len(neutrons):,} neutrons")

# 3. Try different clustering algorithms
config = tdcsophiread.NeutronProcessingConfig.venus_defaults()
config.clustering.algorithm = "dbscan"  # or "abs", "graph", "grid"
neutrons = tdcsophiread.process_hits_to_neutrons(hits, config)

Performance Monitoring

# Get detailed performance statistics
config = tdcsophiread.NeutronProcessingConfig.venus_defaults()
processor = tdcsophiread.TemporalNeutronProcessor(config)
neutrons = processor.processHits(hits)

stats = processor.getStatistics()
print(f"Hit rate: {stats.hits_per_second/1e6:.1f} M hits/sec")
print(f"Neutron efficiency: {stats.neutron_efficiency:.3f}")
print(f"Parallel efficiency: {stats.parallel_efficiency:.2f}")

🧬 Architecture

TDCSophiread implements a modern, high-performance pipeline with parallel temporal processing:

flowchart TD
    A[TPX3 Raw Data] --> B[TDCProcessor]
    B --> |Memory-mapped I/O<br/>Section-aware processing| C[std::vector&lt;TDCHit&gt;<br/>Temporally ordered hits]

    C --> D[TemporalNeutronProcessor]

    subgraph TemporalNeutronProcessor
        direction TB
        E[Phase 1: Statistical Analysis<br/>• Analyze hit distribution<br/>• Calculate optimal batch sizes<br/>• Determine overlaps]

        E --> F[Phase 2: Parallel Worker Pool]

        subgraph ParallelWorkerPool
            direction LR
            Worker0[Worker 0]
            Worker1[Worker 1]
            WorkerN[Worker N]
        end

        subgraph Worker0Details
            direction TB
            G1[Hit Clustering<br/>Algorithm Selection]
            G1 --> G1a["ABS<br/>O(n) - Fastest"]
            G1 --> G1b["Graph<br/>O(n log n) - Balanced"]
            G1 --> G1c["DBSCAN<br/>O(n log n) - Noise handling"]
            G1 --> G1d["Grid<br/>O(n) - Geometry optimized"]
            G1a --> G2[Neutron Extraction<br/>TOT-weighted centroids]
            G1b --> G2
            G1c --> G2
            G1d --> G2
        end

        subgraph Worker1Details
            direction TB
            H1[Hit Clustering] --> H2[Neutron Extraction]
        end

        subgraph WorkerNDetails
            direction TB
            I1[Hit Clustering] --> I2[Neutron Extraction]
        end

        Worker0 --> Worker0Details
        Worker1 --> Worker1Details
        WorkerN --> WorkerNDetails

        F --> J[Phase 3: Result Aggregation<br/>• Combine worker results<br/>• Remove overlap duplicates<br/>• Generate statistics]
    end

    J --> K[std::vector&lt;TDCNeutron&gt;<br/>Final neutron events<br/>96M+ hits/sec performance]

    style A fill:#e1f5fe
    style K fill:#e8f5e8
    style TemporalNeutronProcessor fill:#f3e5f5
    style G1a fill:#ffecb3
    style G1b fill:#fff3e0
    style G1c fill:#fce4ec
    style G1d fill:#e0f2f1

Phase 1: Hit Extraction

  • Memory-mapped I/O: Efficient processing of large TPX3 files
  • Section-aware processing: Respects TPX3 data structure constraints
  • TDC state propagation: Sequential processing for reliable timing
  • Parallel chunk processing: Intel TBB for maximum throughput

Phase 2: Temporal Neutron Processing

  • Statistical analysis: Optimal batching based on hit distribution
  • Parallel worker pool: Each worker has dedicated algorithm instances
  • 4 clustering algorithms: ABS, Graph, DBSCAN, Grid with different performance characteristics
  • Zero-copy processing: Iterator-based interfaces minimize memory overhead

Phase 3: Result Aggregation

  • Parallel result combination: Efficient merging from multiple workers
  • Overlap deduplication: Remove duplicate neutrons from batch boundaries
  • Performance statistics: Detailed metrics for optimization

🎯 Clustering Algorithms

Algorithm Performance Use Case Complexity
ABS Fastest General purpose, high throughput O(n)
Graph Fast Balanced speed/accuracy O(n log n)
DBSCAN Medium Noise handling, complex patterns O(n log n)
Grid Fast Detector geometry optimization O(n)

Algorithm Configuration

config = tdcsophiread.NeutronProcessingConfig.venus_defaults()

# ABS (Adaptive Bucket Sort) - Fastest
config.clustering.algorithm = "abs"
config.clustering.abs.radius = 5.0
config.clustering.abs.neutron_correlation_window = 75.0  # nanoseconds

# DBSCAN - Best noise handling
config.clustering.algorithm = "dbscan"
config.clustering.dbscan.epsilon = 4.0
config.clustering.dbscan.min_points = 3

# Process with custom configuration
neutrons = tdcsophiread.process_hits_to_neutrons(hits, config)

📊 Performance

Measured Performance (Real Hardware)

System Hit Rate Clustering Notes
M2 Max 20M+ hits/sec ABS Development system
AMD EPYC 9174F 96M+ hits/sec ABS Production target
Memory Usage ~40-60 bytes/hit All Including clustering

Performance by File Size

  • < 100MB: 20-40 M hits/sec (single-threaded sufficient)
  • 100MB-1GB: 50-80 M hits/sec (parallel recommended)
  • 1GB-10GB: 80-96 M hits/sec (optimal parallel)
  • > 10GB: 90-96 M hits/sec (streaming mode)

🔧 Build System

Development Workflow

# Core workflow
pixi run build        # Configure and build C++
pixi run test         # Run C++ tests
pixi run install      # Install Python bindings (editable)
pixi run python-test  # Test Python import

# Data setup (12GB sample data)
pixi run setup-data   # Download sample TPX3 files
pixi run notebooks    # Launch Jupyter with notebooks

Build Options

# Start the subprocess with pixi
pixi shell

# Debug build (if needed)
cmake -B build -DCMAKE_BUILD_TYPE=Debug

# Legacy components (not recommended)
cmake -B build -DBUILD_LEGACY=ON

⚠️ Legacy Warning: Legacy components use unreliable GDC timing and will be removed in the next major release.

📚 Documentation & Examples

Jupyter Notebooks (Real Data)

# Start Jupyter with sample notebooks
pixi run notebooks

Available Notebooks:

  • notebooks/hits_extraction_from_tpx3_Ni.ipynb - Hit extraction (96M+ hits/sec)
  • notebooks/neutrons_extraction_from_tpx3_Ni.ipynb - Complete neutron processing
  • notebooks/clustering_abs_ni.ipynb - ABS clustering demo
  • notebooks/clustering_graph_ni.ipynb - Graph clustering demo
  • notebooks/clustering_dbscan_Ni.ipynb - DBSCAN clustering demo
  • notebooks/clustering_grid_Ni.ipynb - Grid clustering demo

Documentation

🗂️ Data Format

Hit Data (Structured NumPy Array)

hits = tdcsophiread.process_tpx3("data.tpx3")
print(f"Fields: {hits.dtype.names}")
# ('tof', 'x', 'y', 'timestamp', 'tot', 'chip_id', 'cluster_id')

# Access hit properties
x_coords = hits['x']          # Global X coordinates (uint16)
y_coords = hits['y']          # Global Y coordinates (uint16)
tof_values = hits['tof']      # Time-of-flight (uint32, 25ns units)
tot_values = hits['tot']      # Time-over-threshold (uint16)
chip_ids = hits['chip_id']    # Chip ID 0-3 (uint8)

Neutron Data (Structured NumPy Array)

neutrons = tdcsophiread.process_hits_to_neutrons(hits)
print(f"Fields: {neutrons.dtype.names}")
# ('x', 'y', 'tof', 'tot', 'n_hits', 'chip_id', 'reserved')

# Access neutron properties
x_subpixel = neutrons['x']     # Sub-pixel X coordinates (float64)
y_subpixel = neutrons['y']     # Sub-pixel Y coordinates (float64)
tof_neutron = neutrons['tof']  # Representative TOF (uint32, 25ns units)
cluster_size = neutrons['n_hits'] # Number of hits in cluster (uint16)

Unit Conversions

# Time conversions
tof_ms = hits['tof'] * 25 / 1e6        # 25ns units → milliseconds
timestamp_s = hits['timestamp'] * 25 / 1e9  # 25ns units → seconds

# Coordinate conversions
pixel_x = neutrons['x'] / 8.0          # Sub-pixel → pixel (factor=8)
pixel_y = neutrons['y'] / 8.0

⚙️ Configuration

JSON Configuration

{
  "clustering": {
    "algorithm": "abs",
    "abs": {
      "radius": 5.0,
      "neutron_correlation_window": 75.0
    }
  },
  "extraction": {
    "algorithm": "simple_centroid",
    "super_resolution_factor": 8.0,
    "weighted_by_tot": true
  },
  "temporal": {
    "num_workers": 0,
    "max_batch_size": 100000
  }
}

Detector Configuration

{
  "detector": {
    "timing": {
      "tdc_frequency_hz": 60.0,
      "enable_missing_tdc_correction": true
    },
    "chip_layout": {
      "chip_size_x": 256,
      "chip_size_y": 256
    }
  }
}

🔬 Scientific Context

TPX3 Data Constraints

TDCSophiread respects the physical constraints of TPX3 data:

  • Variable section sizes: No padding or fixed boundaries
  • Local time disorder: Packets within sections not time-ordered
  • Missing TDC packets: Hardware may drop TDC packets (corrected automatically)
  • Sequential dependencies: TDC state must propagate in order

🛠️ Development

Requirements

  • C++20 compiler (GCC 10+, Clang 11+, MSVC 2019+)
  • Intel TBB for parallel processing
  • HDF5 for data I/O
  • Python 3.8+ with NumPy
  • CMake 3.20+

Environment Setup

# Install pixi (cross-platform package manager)
curl -sSL https://pixi.sh/install | bash

# Clone and setup
git clone https://github.com/ornlneutronimaging/mcpevent2hist.git
cd mcpevent2hist/sophiread
pixi install

Code Style

  • C++20 with modern practices
  • Google C++ Style (2-space indentation)
  • Test-Driven Development with Google Test
  • Zero-copy design patterns
  • Stateless algorithms for parallelization

🔗 Legacy Components

Previous implementations (FastSophiread, CLI/GUI applications) have been moved to legacy/ and are deprecated:

  • Unreliable GDC timing (disapproved by detector experts)
  • Template complexity (hard to maintain)

Migration: All legacy functionality is available in TDCSophiread with improved performance and reliability.

📈 Benchmarks

Real-World Performance

Using sample data from notebooks/data/:

# Ni powder diffraction data (>1M hits)
sample_file = "notebooks/data/Run_8217_April25_2025_Ni_Powder_MCP_TPX3_0_8C_1_9_AngsMin_serval_000000.tpx3"

import time
start = time.time()
hits = tdcsophiread.process_tpx3(sample_file, parallel=True)
neutrons = tdcsophiread.process_hits_to_neutrons(hits)
elapsed = time.time() - start

print(f"Performance: {len(hits) / elapsed / 1e6:.1f} M hits/sec")
print(f"Found {len(neutrons):,} neutrons from {len(hits):,} hits")

Memory Efficiency

  • Before optimization: 48GB peak memory
  • After optimization: 20GB peak memory (58% reduction)
  • Current streaming: 512MB chunks for any file size

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

Issue Reporting

📄 License

GPL-3.0+ License - see LICENSE file for details.


Ready to process neutron data at 96M+ hits/sec? 🚀

Get started: docs/quickstart.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdcsophiread-3.1.3.tar.gz (168.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tdcsophiread-3.1.3-cp314-cp314-manylinux_2_34_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

tdcsophiread-3.1.3-cp313-cp313-manylinux_2_34_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

tdcsophiread-3.1.3-cp312-cp312-manylinux_2_34_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

tdcsophiread-3.1.3-cp311-cp311-manylinux_2_34_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

tdcsophiread-3.1.3-cp310-cp310-manylinux_2_34_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

File details

Details for the file tdcsophiread-3.1.3.tar.gz.

File metadata

  • Download URL: tdcsophiread-3.1.3.tar.gz
  • Upload date:
  • Size: 168.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tdcsophiread-3.1.3.tar.gz
Algorithm Hash digest
SHA256 a2c57cb46c3d431ba545e3de46c3b5f39f14bef5d42297d119698b2bb63fbd9d
MD5 ed567fe734022577cc8775f43a6fdec6
BLAKE2b-256 73939d63b8fe1876152731c2b9ef5934060d26a4bd13858d5653f746ed6f23d5

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.3-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for tdcsophiread-3.1.3-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 14519eea4e38abf8dcd0ae136d718885e066ce5761ad9fe0545be0159f54c6f2
MD5 1aaa98bc89ad10282843080d9a3bb558
BLAKE2b-256 699d6a2c7b889a80ce1cb2fec4cbc5637b0881f42b584ee56be816b6701b34e9

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.3-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for tdcsophiread-3.1.3-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 36d5aef2c915a15351d645e45eb888f0466cf96bda5dded8f19f2b5109995e6b
MD5 4337dc60f141483905d389f086ba7197
BLAKE2b-256 840f89d8747be435fb1fbfc9337107991995895b8c5cb1a69d844c254ca4138c

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.3-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for tdcsophiread-3.1.3-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 34ef21eb21b02b6d1c4b516dab9ab50a18010624715a965a6811a64a2c161eaf
MD5 b924a5a15cedf9aa71c251b491744fdb
BLAKE2b-256 5da1ae58110dd2bfd2a70598591af2c8250a1785cf6b1af171798299e49bb857

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.3-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for tdcsophiread-3.1.3-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 176116656e4d5972c39bda72ccf70c32abc7bfdfdb685b65ec62ba66d9e9184a
MD5 1dabe460ba6d339cec0a5cb16e7cae13
BLAKE2b-256 e6ce21beb7d493628e23633a4e321d18a7a7a30d426e10edad4e585a2cf95717

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.3-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for tdcsophiread-3.1.3-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 1dc44e3275c56cc40512e4135f2d2ac950e702149ccbb071a9912324e59aa2c0
MD5 0ec91911079af85c4343d36635bda180
BLAKE2b-256 635b1bca536de1a75cb1f898f06258dd69ff38dfa513782bf4ba94cfaae717ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page