High-performance TDC-only TPX3 neutron imaging data processor

These details have not been verified by PyPI

Project links

Project description

TDCSophiread

High-performance Python and C++ library for processing TPX3 neutron imaging data with 96M+ hits/sec throughput. TDCSophiread provides complete hit extraction and neutron clustering capabilities using TDC-only timing (detector-expert approved).

🚀 Key Features

🏃 High Performance: 96M+ hits/sec with Intel TBB parallel processing
🧠 Smart Clustering: 4 algorithms (ABS, Graph, DBSCAN, Grid) for neutron event reconstruction
⚡ Zero-Copy Processing: Memory-efficient temporal batching with structured numpy arrays
🔍 TDC-Only Timing: Detector-expert-approved approach (no unreliable GDC)
🐍 Python Integration: Complete Python API with Jupyter notebook examples
📊 Production Ready: Real-world performance validated on 12GB datasets

Quick Start

Installation

# Clone repository
git clone https://github.com/ornlneutronimaging/mcpevent2hist.git
cd mcpevent2hist/sophiread

# Set up environment (pixi recommended)
pixi install

# Build and install
pixi run dev-install

Note: if you prefer to do build in a staged manner, you can issue the following commands:

# configure with CMake
pixi run configure
# build with CMake
pixi run build
# run tests with CMake
pixi run test
# install Python bindings
pixi run pip install -e . --no-build-isolation

Get Sample Data (12GB)

# Make sure you have git lfs installed, then run:
git lfs install
# Initialize the git submodule
git submodule init
# Download real TPX3 datasets for testing
git submodule update --init resources/sophiread_data

Python Usage

import tdcsophiread

# 1. Extract hits from TPX3 file
hits = tdcsophiread.process_tpx3("data.tpx3", parallel=True)
print(f"Extracted {len(hits):,} hits")

# 2. Process hits to neutrons using clustering
neutrons = tdcsophiread.process_hits_to_neutrons(hits)
print(f"Found {len(neutrons):,} neutrons")

# 3. Try different clustering algorithms
config = tdcsophiread.NeutronProcessingConfig.venus_defaults()
config.clustering.algorithm = "dbscan"  # or "abs", "graph", "grid"
neutrons = tdcsophiread.process_hits_to_neutrons(hits, config)

Performance Monitoring

# Get detailed performance statistics
config = tdcsophiread.NeutronProcessingConfig.venus_defaults()
processor = tdcsophiread.TemporalNeutronProcessor(config)
neutrons = processor.processHits(hits)

stats = processor.getStatistics()
print(f"Hit rate: {stats.hits_per_second/1e6:.1f} M hits/sec")
print(f"Neutron efficiency: {stats.neutron_efficiency:.3f}")
print(f"Parallel efficiency: {stats.parallel_efficiency:.2f}")

🧬 Architecture

TDCSophiread implements a modern, high-performance pipeline with parallel temporal processing:

flowchart TD
    A[TPX3 Raw Data] --> B[TDCProcessor]
    B --> |Memory-mapped I/O<br/>Section-aware processing| C[std::vector&lt;TDCHit&gt;<br/>Temporally ordered hits]

    C --> D[TemporalNeutronProcessor]

    subgraph TemporalNeutronProcessor
        direction TB
        E[Phase 1: Statistical Analysis<br/>• Analyze hit distribution<br/>• Calculate optimal batch sizes<br/>• Determine overlaps]

        E --> F[Phase 2: Parallel Worker Pool]

        subgraph ParallelWorkerPool
            direction LR
            Worker0[Worker 0]
            Worker1[Worker 1]
            WorkerN[Worker N]
        end

        subgraph Worker0Details
            direction TB
            G1[Hit Clustering<br/>Algorithm Selection]
            G1 --> G1a["ABS<br/>O(n) - Fastest"]
            G1 --> G1b["Graph<br/>O(n log n) - Balanced"]
            G1 --> G1c["DBSCAN<br/>O(n log n) - Noise handling"]
            G1 --> G1d["Grid<br/>O(n) - Geometry optimized"]
            G1a --> G2[Neutron Extraction<br/>TOT-weighted centroids]
            G1b --> G2
            G1c --> G2
            G1d --> G2
        end

        subgraph Worker1Details
            direction TB
            H1[Hit Clustering] --> H2[Neutron Extraction]
        end

        subgraph WorkerNDetails
            direction TB
            I1[Hit Clustering] --> I2[Neutron Extraction]
        end

        Worker0 --> Worker0Details
        Worker1 --> Worker1Details
        WorkerN --> WorkerNDetails

        F --> J[Phase 3: Result Aggregation<br/>• Combine worker results<br/>• Remove overlap duplicates<br/>• Generate statistics]
    end

    J --> K[std::vector&lt;TDCNeutron&gt;<br/>Final neutron events<br/>96M+ hits/sec performance]

    style A fill:#e1f5fe
    style K fill:#e8f5e8
    style TemporalNeutronProcessor fill:#f3e5f5
    style G1a fill:#ffecb3
    style G1b fill:#fff3e0
    style G1c fill:#fce4ec
    style G1d fill:#e0f2f1

Phase 1: Hit Extraction

Memory-mapped I/O: Efficient processing of large TPX3 files
Section-aware processing: Respects TPX3 data structure constraints
TDC state propagation: Sequential processing for reliable timing
Parallel chunk processing: Intel TBB for maximum throughput

Phase 2: Temporal Neutron Processing

Statistical analysis: Optimal batching based on hit distribution
Parallel worker pool: Each worker has dedicated algorithm instances
4 clustering algorithms: ABS, Graph, DBSCAN, Grid with different performance characteristics
Zero-copy processing: Iterator-based interfaces minimize memory overhead

Phase 3: Result Aggregation

Parallel result combination: Efficient merging from multiple workers
Overlap deduplication: Remove duplicate neutrons from batch boundaries
Performance statistics: Detailed metrics for optimization

🎯 Clustering Algorithms

Algorithm	Performance	Use Case	Complexity
ABS	Fastest	General purpose, high throughput	O(n)
Graph	Fast	Balanced speed/accuracy	O(n log n)
DBSCAN	Medium	Noise handling, complex patterns	O(n log n)
Grid	Fast	Detector geometry optimization	O(n)

Algorithm Configuration

config = tdcsophiread.NeutronProcessingConfig.venus_defaults()

# ABS (Adaptive Bucket Sort) - Fastest
config.clustering.algorithm = "abs"
config.clustering.abs.radius = 5.0
config.clustering.abs.neutron_correlation_window = 75.0  # nanoseconds

# DBSCAN - Best noise handling
config.clustering.algorithm = "dbscan"
config.clustering.dbscan.epsilon = 4.0
config.clustering.dbscan.min_points = 3

# Process with custom configuration
neutrons = tdcsophiread.process_hits_to_neutrons(hits, config)

📊 Performance

Measured Performance (Real Hardware)

System	Hit Rate	Clustering	Notes
M2 Max	20M+ hits/sec	ABS	Development system
AMD EPYC 9174F	96M+ hits/sec	ABS	Production target
Memory Usage	~40-60 bytes/hit	All	Including clustering

Performance by File Size

< 100MB: 20-40 M hits/sec (single-threaded sufficient)
100MB-1GB: 50-80 M hits/sec (parallel recommended)
1GB-10GB: 80-96 M hits/sec (optimal parallel)
> 10GB: 90-96 M hits/sec (streaming mode)

🔧 Build System

Development Workflow

# Core workflow
pixi run build        # Configure and build C++
pixi run test         # Run C++ tests
pixi run install      # Install Python bindings (editable)
pixi run python-test  # Test Python import

# Data setup (12GB sample data)
pixi run setup-data   # Download sample TPX3 files
pixi run notebooks    # Launch Jupyter with notebooks

Build Options

# Start the subprocess with pixi
pixi shell

# Debug build (if needed)
cmake -B build -DCMAKE_BUILD_TYPE=Debug

# Legacy components (not recommended)
cmake -B build -DBUILD_LEGACY=ON

⚠️ Legacy Warning: Legacy components use unreliable GDC timing and will be removed in the next major release.

📚 Documentation & Examples

Jupyter Notebooks (Real Data)

# Start Jupyter with sample notebooks
pixi run notebooks

Available Notebooks:

notebooks/hits_extraction_from_tpx3_Ni.ipynb - Hit extraction (96M+ hits/sec)
notebooks/neutrons_extraction_from_tpx3_Ni.ipynb - Complete neutron processing
notebooks/clustering_abs_ni.ipynb - ABS clustering demo
notebooks/clustering_graph_ni.ipynb - Graph clustering demo
notebooks/clustering_dbscan_Ni.ipynb - DBSCAN clustering demo
notebooks/clustering_grid_Ni.ipynb - Grid clustering demo

Documentation

📖 Quick Start: docs/quickstart.md
📋 API Reference: docs/api_reference.md
🏗️ Architecture: TDCSOPHIREAD_ARCHITECTURE_2025.md
🧬 TPX3 Format: TPX3.md

🗂️ Data Format

Hit Data (Structured NumPy Array)

hits = tdcsophiread.process_tpx3("data.tpx3")
print(f"Fields: {hits.dtype.names}")
# ('tof', 'x', 'y', 'timestamp', 'tot', 'chip_id', 'cluster_id')

# Access hit properties
x_coords = hits['x']          # Global X coordinates (uint16)
y_coords = hits['y']          # Global Y coordinates (uint16)
tof_values = hits['tof']      # Time-of-flight (uint32, 25ns units)
tot_values = hits['tot']      # Time-over-threshold (uint16)
chip_ids = hits['chip_id']    # Chip ID 0-3 (uint8)

Neutron Data (Structured NumPy Array)

neutrons = tdcsophiread.process_hits_to_neutrons(hits)
print(f"Fields: {neutrons.dtype.names}")
# ('x', 'y', 'tof', 'tot', 'n_hits', 'chip_id', 'reserved')

# Access neutron properties
x_subpixel = neutrons['x']     # Sub-pixel X coordinates (float64)
y_subpixel = neutrons['y']     # Sub-pixel Y coordinates (float64)
tof_neutron = neutrons['tof']  # Representative TOF (uint32, 25ns units)
cluster_size = neutrons['n_hits'] # Number of hits in cluster (uint16)

Unit Conversions

# Time conversions
tof_ms = hits['tof'] * 25 / 1e6        # 25ns units → milliseconds
timestamp_s = hits['timestamp'] * 25 / 1e9  # 25ns units → seconds

# Coordinate conversions
pixel_x = neutrons['x'] / 8.0          # Sub-pixel → pixel (factor=8)
pixel_y = neutrons['y'] / 8.0

⚙️ Configuration

JSON Configuration

{
  "clustering": {
    "algorithm": "abs",
    "abs": {
      "radius": 5.0,
      "neutron_correlation_window": 75.0
    }
  },
  "extraction": {
    "algorithm": "simple_centroid",
    "super_resolution_factor": 8.0,
    "weighted_by_tot": true
  },
  "temporal": {
    "num_workers": 0,
    "max_batch_size": 100000
  }
}

Detector Configuration

{
  "detector": {
    "timing": {
      "tdc_frequency_hz": 60.0,
      "enable_missing_tdc_correction": true
    },
    "chip_layout": {
      "chip_size_x": 256,
      "chip_size_y": 256
    }
  }
}

🔬 Scientific Context

TPX3 Data Constraints

TDCSophiread respects the physical constraints of TPX3 data:

Variable section sizes: No padding or fixed boundaries
Local time disorder: Packets within sections not time-ordered
Missing TDC packets: Hardware may drop TDC packets (corrected automatically)
Sequential dependencies: TDC state must propagate in order

🛠️ Development

Requirements

C++20 compiler (GCC 10+, Clang 11+, MSVC 2019+)
Intel TBB for parallel processing
HDF5 for data I/O
Python 3.8+ with NumPy
CMake 3.20+

Environment Setup

# Install pixi (cross-platform package manager)
curl -sSL https://pixi.sh/install | bash

# Clone and setup
git clone https://github.com/ornlneutronimaging/mcpevent2hist.git
cd mcpevent2hist/sophiread
pixi install

Code Style

C++20 with modern practices
Google C++ Style (2-space indentation)
Test-Driven Development with Google Test
Zero-copy design patterns
Stateless algorithms for parallelization

🔗 Legacy Components

Previous implementations (FastSophiread, CLI/GUI applications) have been moved to legacy/ and are deprecated:

❌ Unreliable GDC timing (disapproved by detector experts)
❌ Template complexity (hard to maintain)

Migration: All legacy functionality is available in TDCSophiread with improved performance and reliability.

📈 Benchmarks

Real-World Performance

Using sample data from notebooks/data/:

# Ni powder diffraction data (>1M hits)
sample_file = "notebooks/data/Run_8217_April25_2025_Ni_Powder_MCP_TPX3_0_8C_1_9_AngsMin_serval_000000.tpx3"

import time
start = time.time()
hits = tdcsophiread.process_tpx3(sample_file, parallel=True)
neutrons = tdcsophiread.process_hits_to_neutrons(hits)
elapsed = time.time() - start

print(f"Performance: {len(hits) / elapsed / 1e6:.1f} M hits/sec")
print(f"Found {len(neutrons):,} neutrons from {len(hits):,} hits")

Memory Efficiency

Before optimization: 48GB peak memory
After optimization: 20GB peak memory (58% reduction)
Current streaming: 512MB chunks for any file size

🤝 Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

Issue Reporting

🐛 Bugs: GitHub Issues
💬 Discussions: GitHub Discussions
📧 Contact: neutronimaging@ornl.gov

📄 License

GPL-3.0+ License - see LICENSE file for details.

Ready to process neutron data at 96M+ hits/sec? 🚀

Get started: docs/quickstart.md

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.1.3

Oct 13, 2025

3.1.1

Oct 10, 2025

This version

3.1.0

Oct 10, 2025

3.0.1

Jul 15, 2025

3.0.0

Jul 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tdcsophiread-3.1.0.tar.gz (1.1 MB view details)

Uploaded Oct 10, 2025 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tdcsophiread-3.1.0-cp314-cp314-manylinux_2_28_x86_64.whl (527.5 kB view details)

Uploaded Oct 10, 2025 CPython 3.14manylinux: glibc 2.28+ x86-64

tdcsophiread-3.1.0-cp313-cp313-manylinux_2_28_x86_64.whl (527.3 kB view details)

Uploaded Oct 10, 2025 CPython 3.13manylinux: glibc 2.28+ x86-64

tdcsophiread-3.1.0-cp312-cp312-manylinux_2_28_x86_64.whl (526.3 kB view details)

Uploaded Oct 10, 2025 CPython 3.12manylinux: glibc 2.28+ x86-64

tdcsophiread-3.1.0-cp311-cp311-manylinux_2_28_x86_64.whl (527.9 kB view details)

Uploaded Oct 10, 2025 CPython 3.11manylinux: glibc 2.28+ x86-64

tdcsophiread-3.1.0-cp310-cp310-manylinux_2_28_x86_64.whl (526.5 kB view details)

Uploaded Oct 10, 2025 CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file tdcsophiread-3.1.0.tar.gz.

File metadata

Download URL: tdcsophiread-3.1.0.tar.gz
Upload date: Oct 10, 2025
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tdcsophiread-3.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ed9c201ccae9329723f1c013fecd49328866d5a66ddb23f3cbdb7f7870014bd2`
MD5	`6dd066fa3696f5bb396b6780e5403210`
BLAKE2b-256	`1fa348eda1b10eb8a331f6b8d6f53f808e6f345bacaec40a29df03e6c82e21e4`

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.0-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

Download URL: tdcsophiread-3.1.0-cp314-cp314-manylinux_2_28_x86_64.whl
Upload date: Oct 10, 2025
Size: 527.5 kB
Tags: CPython 3.14, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tdcsophiread-3.1.0-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`3db35900637887bec8bf6187216dba7f6c4bec7e99b7ff03a0976ebba0deaa35`
MD5	`e0f7df3949659391522d84ee2f8858f9`
BLAKE2b-256	`2aa2f71e5947ac03dc6a7b4721870f65ff1c90b6ddd6f62bdcc85c65f6837a76`

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.0-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

Download URL: tdcsophiread-3.1.0-cp313-cp313-manylinux_2_28_x86_64.whl
Upload date: Oct 10, 2025
Size: 527.3 kB
Tags: CPython 3.13, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tdcsophiread-3.1.0-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`61833e9d38394e9ff4333a3b55ccddf598006425dfb5300c6338b5d1a6fb0947`
MD5	`92d4552bc7cf67ecfba0d8d5e1bf10ca`
BLAKE2b-256	`e4a83ad703dc1fc16c2ef3f9add1760e7da8c87258605fbeaf7ca6353916e6a8`

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

Download URL: tdcsophiread-3.1.0-cp312-cp312-manylinux_2_28_x86_64.whl
Upload date: Oct 10, 2025
Size: 526.3 kB
Tags: CPython 3.12, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tdcsophiread-3.1.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`97c669ebf450df4faf7a77a0c6f38ed48802d19644e445e3fe65611524c71be1`
MD5	`7d34672d13a7527e2dce89cc5d80f7aa`
BLAKE2b-256	`ba0d5cd77c08cd9dc0ac44f135483d3716e95e591cc3efaa4f1a49433d0425d1`

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

Download URL: tdcsophiread-3.1.0-cp311-cp311-manylinux_2_28_x86_64.whl
Upload date: Oct 10, 2025
Size: 527.9 kB
Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tdcsophiread-3.1.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`2fdb6e4ca6620ada61759d42aa77c4f2e971e449e6283f4fe8f415194e69559d`
MD5	`7334bec8c84eda67b15f86443521fa4d`
BLAKE2b-256	`599aee0d1c857f1e3936cc2d836455ae96b59f810c629a692f3e4cc74b51d7e9`

See more details on using hashes here.

File details

Details for the file tdcsophiread-3.1.0-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

Download URL: tdcsophiread-3.1.0-cp310-cp310-manylinux_2_28_x86_64.whl
Upload date: Oct 10, 2025
Size: 526.5 kB
Tags: CPython 3.10, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for tdcsophiread-3.1.0-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`a73874e8c4bf7239c713fbbf9ba8177d4011fa6e6be5f9b10221209fa0e982f9`
MD5	`44b860a2ff776443e0b354ca8b923f86`
BLAKE2b-256	`4851d6f6e4883249b7ffb35ff08add46937f3d54a53b75aded3303996e3bdff4`

See more details on using hashes here.

tdcsophiread 3.1.0

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Project description

TDCSophiread

🚀 Key Features

Quick Start

Installation

Get Sample Data (12GB)

Python Usage

Performance Monitoring

🧬 Architecture

Phase 1: Hit Extraction

Phase 2: Temporal Neutron Processing

Phase 3: Result Aggregation

🎯 Clustering Algorithms

Algorithm Configuration

📊 Performance

Measured Performance (Real Hardware)

Performance by File Size

🔧 Build System

Development Workflow

Build Options

📚 Documentation & Examples

Jupyter Notebooks (Real Data)

Documentation

🗂️ Data Format

Hit Data (Structured NumPy Array)

Neutron Data (Structured NumPy Array)

Unit Conversions

⚙️ Configuration

JSON Configuration

Detector Configuration

🔬 Scientific Context

TPX3 Data Constraints

🛠️ Development

Requirements

Environment Setup

Code Style

🔗 Legacy Components

📈 Benchmarks

Real-World Performance

Memory Efficiency

🤝 Contributing

Issue Reporting

📄 License

Project details

Verified details

Owner

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes