Library to prepare histology and ephys for the IBL ephys alignment GUI

These details have not been verified by PyPI

Project links

Project description

AIND IBL Ephys Alignment Preprocessing

Preprocessing pipeline that prepares SmartSPIM histology and Neuropixels electrophysiology data for the IBL ephys alignment GUI. It registers histology volumes to the Allen Common Coordinate Framework (CCF), converts Neuroglancer probe-track annotations into atlas coordinates, and optionally extracts spike-sorted ephys data into IBL ALF format.

Installation

From PyPI:

pip install aind-ibl-ephys-alignment-preprocessing

From source:

git clone https://github.com/AllenNeuralDynamics/aind-ibl-ephys-alignment-preprocessing.git
cd aind-ibl-ephys-alignment-preprocessing
pip install .

Quick start

The package provides a CLI entry point, aind-ibl-preprocess:

aind-ibl-preprocess \
    --data-root /path/to/data \
    --results-root /path/to/results \
    --neuroglancer neuroglancer.json \
    --manifest manifest.csv

CLI options

Flag	Description
`--data-root PATH`	(required) Root directory containing all input data
`--results-root PATH`	(required) Root directory for output
`--neuroglancer PATH`	(required) Neuroglancer JSON file (absolute, or relative to `data_root`)
`--manifest PATH`	(required) Manifest CSV file (absolute, or relative to `data_root`)
`--scratch-root PATH`	Scratch directory for temporary files (default: system temp)
`--skip-ephys`	Skip electrophysiology extraction
`--validate-only`	Run pre-flight validation checks and exit
`--async`	Run the pipeline asynchronously with concurrency

Validation

Before running the full pipeline, you can check that all required inputs are present and correctly structured:

aind-ibl-preprocess \
    --data-root /path/to/data \
    --results-root /path/to/results \
    --neuroglancer neuroglancer.json \
    --manifest manifest.csv \
    --validate-only

This checks file existence, manifest structure, reference data availability, disk space (warns below 50 GB, errors below 10 GB), and available RAM (minimum 8 GB).

Python API

The package exposes a small public API:

from aind_ibl_ephys_alignment_preprocessing import (
    PipelineConfig,
    run_pipeline,
    run_pipeline_async,
)

config = PipelineConfig(
    data_root="/path/to/data",
    results_root="/path/to/results",
    neuroglancer_file="neuroglancer.json",
    manifest_csv="manifest.csv",
    skip_ephys=False,            # set True to skip ephys extraction
    desired_voxel_size_um=25.0,  # multiscale zarr level selection
    num_parallel_jobs=4,         # parallel workers for ephys
)

# Synchronous
results = run_pipeline(config)

# Asynchronous (structured concurrency with TaskGroups)
import asyncio
results = asyncio.run(run_pipeline_async(config))

Each element of results is a ProcessResult containing the probe ID, recording ID, list of files written, and an optional skip reason.

Input data requirements

Data root layout

All input paths can be absolute or relative to data_root. The expected directory structure is:

data_root/
|
|-- neuroglancer.json
|-- manifest.csv
|
|-- smartspim_lca_template/
|   +-- smartspim_lca_template_25.nii.gz
|
|-- allen_mouse_ccf/
|   +-- average_template/
|       +-- average_template_25.nii.gz
|
|-- allen_mouse_ccf_annotations_lateralized_compact/
|   |-- ccf_2017_annotation_25_lateralized_compact.nrrd
|   +-- ccf_2017_annotation_25_lateralized_unique_vals.npz
|
|-- iblatlas_allenatlas/
|   +-- ...
|
|-- spim_template_to_ccf/
|   |-- syn_0GenericAffine.mat
|   +-- syn_1InverseWarp.nii.gz
|
|-- <smartspim_asset>/
|   |-- image_tile_fusing/
|   |   +-- OMEZarr/
|   |       |-- <registration_channel>.zarr/
|   |       +-- <additional_channels>.zarr/
|   +-- image_atlas_alignment/
|       +-- <registration_channel_stem>/
|           |-- moved_ls_to_ccf.nii.gz
|           |-- ls_to_template_SyN_0GenericAffine.mat
|           +-- ls_to_template_SyN_1InverseWarp.nii.gz
|
|-- <sorted_recording>/
|   +-- ...  (spike sorting output)
|
+-- **/<probe_file>.json

Each of these is described in detail below.

Manifest CSV

A CSV file describing which probes to process. Each row represents one probe (or one shank of a multi-shank probe).

Required columns:

Column	Description
`mouseid`	Mouse identifier. All rows must reference the same mouse.
`sorted_recording`	Name of the spike-sorted recording folder under `data_root`. The recording ID is derived by stripping a `_sorted` suffix if present.
`probe_file`	Basename (without extension) of the Neuroglancer annotation file. Resolved via glob `*/<probe_file>.<annotation_format>` under `data_root`.
`probe_id`	Unique probe identifier.
`probe_name`	Subfolder name used for GUI output artifacts.

Optional columns:

Column	Default	Description
`annotation_format`	`json`	File extension for the annotation file (lowercase).
`probe_shank`	null	0-based shank index for multi-shank probes. Leave empty for single-shank.
`surface_finding`	null	Path (relative to `data_root`) to a surface-finding file.

Constraints:

All rows must have the same mouseid.
The tuple (recording_id, probe_name, probe_shank) must be unique across rows.
For multi-shank probes, multiple rows can share the same probe_name but must differ in probe_shank.

Example:

mouseid,sorted_recording,probe_file,probe_id,probe_name,probe_shank
mouse001,2024-06-01_rec_sorted,track_annotations_probeA,A0001,probeA,
mouse001,2024-06-01_rec_sorted,track_annotations_probeB_shank0,B0001_s0,probeB,0
mouse001,2024-06-01_rec_sorted,track_annotations_probeB_shank1,B0001_s1,probeB,1

Neuroglancer JSON

A Neuroglancer state JSON file that contains image source URIs pointing to the SmartSPIM OME-Zarr volumes. The pipeline extracts the first image source URI to locate the SmartSPIM asset directory and discover the registration channel and any additional channels.

The asset directory must contain:

image_tile_fusing/OMEZarr/<channel>.zarr/ -- fused OME-Zarr volumes
image_atlas_alignment/<registration_channel_stem>/ -- ANTs registration outputs including moved_ls_to_ccf.nii.gz (the precomputed light-sheet to CCF registration)

Reference volumes

These files are required for atlas registration. Default paths are relative to data_root and can be overridden in PipelineConfig.

File	Default path	Format	Description
SmartSPIM LCA template	`smartspim_lca_template/smartspim_lca_template_25.nii.gz`	NIfTI (.nii.gz)	25 um SmartSPIM template volume
CCF average template	`allen_mouse_ccf/average_template/average_template_25.nii.gz`	NIfTI (.nii.gz)	Allen CCF average template at 25 um
CCF lateralized labels	`allen_mouse_ccf_annotations_lateralized_compact/ccf_2017_annotation_25_lateralized_compact.nrrd`	NRRD	Lateralized CCF annotation labels at 25 um
CCF label lookup	`allen_mouse_ccf_annotations_lateralized_compact/ccf_2017_annotation_25_lateralized_unique_vals.npz`	NumPy (.npz)	Unique label values for the annotation volume
IBL atlas histology	`iblatlas_allenatlas/`	Directory	IBL atlas histology data used by `iblatlas.AllenAtlas`
Template-to-CCF transforms	`spim_template_to_ccf/`	Directory	ANTs SyN registration outputs (`syn_0GenericAffine.mat`, `syn_1InverseWarp.nii.gz`)

Probe annotation files

Per-probe Neuroglancer point annotation files. For each row in the manifest, the pipeline searches data_root for a file matching the glob pattern */<probe_file>.<annotation_format> (e.g., */track_annotations_probeA.json).

These files contain 3D point annotations marking the probe track in the SmartSPIM image space. They are converted through the transform chain:

Neuroglancer pixels --> SPIM (LPS) --> Template --> CCF (LPS/um) --> Bregma (IBL)

Electrophysiology data

Unless --skip-ephys is passed, the pipeline expects spike-sorted data at data_root/<sorted_recording>/ for each unique recording in the manifest. The ephys extraction (delegated to aind-ephys-ibl-gui-conversion) converts this into IBL ALF format (e.g., spikes.times.npy, spikes.clusters.npy, channels.localCoordinates.npy, etc.).

Ephys extraction runs once per unique recording ID (deduplicated across manifest rows).

Output structure

All outputs are written under results_root/<mouseid>/:

results_root/
|-- manifest.csv                              # Copy of input manifest
+-- <mouseid>/
    |-- ccf_space_histology/
    |   |-- histology_registration.nrrd       # Registration channel in CCF space
    |   +-- histology_<channel>.nrrd          # Additional channels in CCF space
    |
    |-- image_space_histology/
    |   |-- histology_registration.nrrd       # Registration channel in image space
    |   |-- histology_registration_pipeline.nrrd
    |   |-- ccf_in_mouse.nrrd                 # CCF template warped to image space
    |   +-- labels_in_mouse.nrrd              # CCF labels warped to image space
    |
    |-- track_data/
    |   |-- spim/<probe_id>.*                 # Track coordinates in SPIM space
    |   |-- template/<probe_id>.*             # Track coordinates in template space
    |   |-- ccf/<probe_id>.*                  # Track coordinates in CCF space
    |   |-- bregma_xyz/<probe_id>.*           # Track coordinates in IBL bregma space
    |   +-- datapackage.json                  # Machine-readable output manifest
    |
    +-- <recording_id>/
        +-- <probe_name>/
            |-- xyz_picks.json                # Probe track picks (CCF coordinates)
            |-- xyz_picks_image_space.json    # Probe track picks (image space)
            |-- xyz_picks_shank<N>.json       # Per-shank picks (multi-shank only)
            +-- spikes/                       # Ephys ALF output (if not skipped)

The datapackage.json file contains a structured manifest of all outputs, including transform chain paths, histology volume paths, and per-probe metadata. It can be loaded back with:

from aind_ibl_ephys_alignment_preprocessing.manifest import load_datapackage

dp = load_datapackage("/path/to/datapackage.json")

Pipeline overview

The pipeline performs the following steps:

Asset discovery -- Parse the Neuroglancer JSON to locate the SmartSPIM OME-Zarr volumes and ANTs registration outputs.
Validation -- Check that all required inputs exist and are correctly structured.
Histology processing -- Reorient the registration channel to CCF space, export additional channels, and compute inverse transforms (CCF template and labels warped back to image space).
Probe processing -- For each manifest row, load the Neuroglancer point annotations and convert them through the transform chain (SPIM -> template -> CCF -> bregma), writing coordinate files at each stage and producing xyz_picks.json files for the alignment GUI.
Ephys extraction (optional) -- Convert spike-sorted data into IBL ALF format.
Manifest generation -- Write datapackage.json summarizing all outputs.

Adapting to your own data

If you are not using the AIND SmartSPIM pipeline but want to use this package with your own histology and ephys data, the key requirements are:

OME-Zarr volumes -- Your fused histology images must be in OME-Zarr format, discoverable via a Neuroglancer JSON state file.
ANTs registration outputs -- You need ANTs SyN registration transforms mapping your histology images to a common template and from that template to the Allen CCF. The pipeline expects standard ANTs output files (*_0GenericAffine.mat, *_1InverseWarp.nii.gz).
Reference volumes -- The Allen CCF template, lateralized annotation labels, and the SmartSPIM LCA template at 25 um resolution. These can be overridden in PipelineConfig if your paths differ from the defaults.
Probe annotations -- Neuroglancer point annotation JSON files marking probe tracks in histology image space.
Spike-sorted data (optional) -- If you want ephys extraction, provide spike-sorted output compatible with aind-ephys-ibl-gui-conversion.

The coordinate convention used throughout is the IBL convention: x = ML (right positive), y = AP (anterior positive), z = DV (dorsal positive), with origin at bregma, in meters.

Development

To develop the code, run:

uv sync

Please test your changes using the full linting and testing suite:

./scripts/run_linters_and_checks.sh -c

Or run individual commands:

uv run --frozen ruff format          # Code formatting
uv run --frozen ruff check           # Linting
uv run --frozen mypy                 # Type checking
uv run --frozen interrogate -v src   # Documentation coverage
uv run --frozen codespell --check-filenames  # Spell checking
uv run --frozen pytest --cov aind_ibl_ephys_alignment_preprocessing  # Tests with coverage

Documentation

sphinx-build -b html docs/source/ docs/build/html

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 26, 2026

0.1.3

Mar 24, 2026

0.1.2

Mar 24, 2026

0.1.1

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aind_ibl_ephys_alignment_preprocessing-0.2.0.tar.gz (309.9 kB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aind_ibl_ephys_alignment_preprocessing-0.2.0-py3-none-any.whl (45.2 kB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file aind_ibl_ephys_alignment_preprocessing-0.2.0.tar.gz.

File metadata

Download URL: aind_ibl_ephys_alignment_preprocessing-0.2.0.tar.gz
Upload date: Apr 26, 2026
Size: 309.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for aind_ibl_ephys_alignment_preprocessing-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e8bf4ed9c38d1140e1d19c6f3cb36628016fd98d853f70f539012bf09c80e4e4`
MD5	`9dff0776ded8ad32f829ecc8f284cef8`
BLAKE2b-256	`ade3c419d5cade50d032f6166c87d181a286acdf817d8dd70e13b9fea1788488`

See more details on using hashes here.

File details

Details for the file aind_ibl_ephys_alignment_preprocessing-0.2.0-py3-none-any.whl.

File metadata

Download URL: aind_ibl_ephys_alignment_preprocessing-0.2.0-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 45.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for aind_ibl_ephys_alignment_preprocessing-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ee6fd2d363bdbbc09cbde02a11cee6cf2bd8ab1c4a5b45375b038f43aa549c9c`
MD5	`9bc09c98b0bce8ef83b8b0b0f405b1e1`
BLAKE2b-256	`45f83497bd3e343fa6593ffda282a199fa3c1e763a479e5fed5d59a725150efa`

See more details on using hashes here.

aind-ibl-ephys-alignment-preprocessing 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AIND IBL Ephys Alignment Preprocessing

Installation

Quick start

CLI options

Validation

Python API

Input data requirements

Data root layout

Manifest CSV

Neuroglancer JSON

Reference volumes

Probe annotation files

Electrophysiology data

Output structure

Pipeline overview

Adapting to your own data

Development

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes