Library to prepare histology and ephys for the IBL ephys alignment GUI
Project description
AIND IBL Ephys Alignment Preprocessing
Preprocessing pipeline that prepares SmartSPIM histology and Neuropixels electrophysiology data for the IBL ephys alignment GUI. It registers histology volumes to the Allen Common Coordinate Framework (CCF), converts Neuroglancer probe-track annotations into atlas coordinates, and optionally extracts spike-sorted ephys data into IBL ALF format.
Installation
From PyPI:
pip install aind-ibl-ephys-alignment-preprocessing
From source:
git clone https://github.com/AllenNeuralDynamics/aind-ibl-ephys-alignment-preprocessing.git
cd aind-ibl-ephys-alignment-preprocessing
pip install .
Quick start
The package provides a CLI entry point, aind-ibl-preprocess:
aind-ibl-preprocess \
--data-root /path/to/data \
--results-root /path/to/results \
--neuroglancer neuroglancer.json \
--manifest manifest.csv
CLI options
| Flag | Description |
|---|---|
--data-root PATH |
(required) Root directory containing all input data |
--results-root PATH |
(required) Root directory for output |
--neuroglancer PATH |
(required) Neuroglancer JSON file (absolute, or relative to data_root) |
--manifest PATH |
(required) Manifest CSV file (absolute, or relative to data_root) |
--scratch-root PATH |
Scratch directory for temporary files (default: system temp) |
--skip-ephys |
Skip electrophysiology extraction |
--validate-only |
Run pre-flight validation checks and exit |
--async |
Run the pipeline asynchronously with concurrency |
Validation
Before running the full pipeline, you can check that all required inputs are present and correctly structured:
aind-ibl-preprocess \
--data-root /path/to/data \
--results-root /path/to/results \
--neuroglancer neuroglancer.json \
--manifest manifest.csv \
--validate-only
This checks file existence, manifest structure, reference data availability, disk space (warns below 50 GB, errors below 10 GB), and available RAM (minimum 8 GB).
Python API
The package exposes a small public API:
from aind_ibl_ephys_alignment_preprocessing import (
PipelineConfig,
run_pipeline,
run_pipeline_async,
)
config = PipelineConfig(
data_root="/path/to/data",
results_root="/path/to/results",
neuroglancer_file="neuroglancer.json",
manifest_csv="manifest.csv",
skip_ephys=False, # set True to skip ephys extraction
desired_voxel_size_um=25.0, # multiscale zarr level selection
num_parallel_jobs=4, # parallel workers for ephys
)
# Synchronous
results = run_pipeline(config)
# Asynchronous (structured concurrency with TaskGroups)
import asyncio
results = asyncio.run(run_pipeline_async(config))
Each element of results is a ProcessResult containing the probe ID,
recording ID, list of files written, and an optional skip reason.
Input data requirements
Data root layout
All input paths can be absolute or relative to data_root. The expected
directory structure is:
data_root/
|
|-- neuroglancer.json
|-- manifest.csv
|
|-- smartspim_lca_template/
| +-- smartspim_lca_template_25.nii.gz
|
|-- allen_mouse_ccf/
| +-- average_template/
| +-- average_template_25.nii.gz
|
|-- allen_mouse_ccf_annotations_lateralized_compact/
| |-- ccf_2017_annotation_25_lateralized_compact.nrrd
| +-- ccf_2017_annotation_25_lateralized_unique_vals.npz
|
|-- iblatlas_allenatlas/
| +-- ...
|
|-- spim_template_to_ccf/
| |-- syn_0GenericAffine.mat
| +-- syn_1InverseWarp.nii.gz
|
|-- <smartspim_asset>/
| |-- image_tile_fusing/
| | +-- OMEZarr/
| | |-- <registration_channel>.zarr/
| | +-- <additional_channels>.zarr/
| +-- image_atlas_alignment/
| +-- <registration_channel_stem>/
| |-- moved_ls_to_ccf.nii.gz
| |-- ls_to_template_SyN_0GenericAffine.mat
| +-- ls_to_template_SyN_1InverseWarp.nii.gz
|
|-- <sorted_recording>/
| +-- ... (spike sorting output)
|
+-- **/<probe_file>.json
Each of these is described in detail below.
Manifest CSV
A CSV file describing which probes to process. Each row represents one probe (or one shank of a multi-shank probe).
Required columns:
| Column | Description |
|---|---|
mouseid |
Mouse identifier. All rows must reference the same mouse. |
sorted_recording |
Name of the spike-sorted recording folder under data_root. The recording ID is derived by stripping a _sorted suffix if present. |
probe_file |
Basename (without extension) of the Neuroglancer annotation file. Resolved via glob */<probe_file>.<annotation_format> under data_root. |
probe_id |
Unique probe identifier. |
probe_name |
Subfolder name used for GUI output artifacts. |
Optional columns:
| Column | Default | Description |
|---|---|---|
annotation_format |
json |
File extension for the annotation file (lowercase). |
probe_shank |
null | 0-based shank index for multi-shank probes. Leave empty for single-shank. |
surface_finding |
null | Path (relative to data_root) to a surface-finding file. |
Constraints:
- All rows must have the same
mouseid. - The tuple
(recording_id, probe_name, probe_shank)must be unique across rows. - For multi-shank probes, multiple rows can share the same
probe_namebut must differ inprobe_shank.
Example:
mouseid,sorted_recording,probe_file,probe_id,probe_name,probe_shank
mouse001,2024-06-01_rec_sorted,track_annotations_probeA,A0001,probeA,
mouse001,2024-06-01_rec_sorted,track_annotations_probeB_shank0,B0001_s0,probeB,0
mouse001,2024-06-01_rec_sorted,track_annotations_probeB_shank1,B0001_s1,probeB,1
Neuroglancer JSON
A Neuroglancer state JSON file that contains image source URIs pointing to the SmartSPIM OME-Zarr volumes. The pipeline extracts the first image source URI to locate the SmartSPIM asset directory and discover the registration channel and any additional channels.
The asset directory must contain:
image_tile_fusing/OMEZarr/<channel>.zarr/-- fused OME-Zarr volumesimage_atlas_alignment/<registration_channel_stem>/-- ANTs registration outputs includingmoved_ls_to_ccf.nii.gz(the precomputed light-sheet to CCF registration)
Reference volumes
These files are required for atlas registration. Default paths are relative to
data_root and can be overridden in PipelineConfig.
| File | Default path | Format | Description |
|---|---|---|---|
| SmartSPIM LCA template | smartspim_lca_template/smartspim_lca_template_25.nii.gz |
NIfTI (.nii.gz) | 25 um SmartSPIM template volume |
| CCF average template | allen_mouse_ccf/average_template/average_template_25.nii.gz |
NIfTI (.nii.gz) | Allen CCF average template at 25 um |
| CCF lateralized labels | allen_mouse_ccf_annotations_lateralized_compact/ccf_2017_annotation_25_lateralized_compact.nrrd |
NRRD | Lateralized CCF annotation labels at 25 um |
| CCF label lookup | allen_mouse_ccf_annotations_lateralized_compact/ccf_2017_annotation_25_lateralized_unique_vals.npz |
NumPy (.npz) | Unique label values for the annotation volume |
| IBL atlas histology | iblatlas_allenatlas/ |
Directory | IBL atlas histology data used by iblatlas.AllenAtlas |
| Template-to-CCF transforms | spim_template_to_ccf/ |
Directory | ANTs SyN registration outputs (syn_0GenericAffine.mat, syn_1InverseWarp.nii.gz) |
Probe annotation files
Per-probe Neuroglancer point annotation files. For each row in the manifest,
the pipeline searches data_root for a file matching the glob pattern
*/<probe_file>.<annotation_format> (e.g., */track_annotations_probeA.json).
These files contain 3D point annotations marking the probe track in the SmartSPIM image space. They are converted through the transform chain:
Neuroglancer pixels --> SPIM (LPS) --> Template --> CCF (LPS/um) --> Bregma (IBL)
Electrophysiology data
Unless --skip-ephys is passed, the pipeline expects spike-sorted data at
data_root/<sorted_recording>/ for each unique recording in the manifest. The
ephys extraction (delegated to
aind-ephys-ibl-gui-conversion)
converts this into IBL ALF format (e.g., spikes.times.npy,
spikes.clusters.npy, channels.localCoordinates.npy, etc.).
Ephys extraction runs once per unique recording ID (deduplicated across manifest rows).
Output structure
All outputs are written under results_root/<mouseid>/:
results_root/
|-- manifest.csv # Copy of input manifest
+-- <mouseid>/
|-- ccf_space_histology/
| |-- histology_registration.nrrd # Registration channel in CCF space
| +-- histology_<channel>.nrrd # Additional channels in CCF space
|
|-- image_space_histology/
| |-- histology_registration.nrrd # Registration channel in image space
| |-- histology_registration_pipeline.nrrd
| |-- ccf_in_mouse.nrrd # CCF template warped to image space
| +-- labels_in_mouse.nrrd # CCF labels warped to image space
|
|-- track_data/
| |-- spim/<probe_id>.* # Track coordinates in SPIM space
| |-- template/<probe_id>.* # Track coordinates in template space
| |-- ccf/<probe_id>.* # Track coordinates in CCF space
| |-- bregma_xyz/<probe_id>.* # Track coordinates in IBL bregma space
| +-- datapackage.json # Machine-readable output manifest
|
+-- <recording_id>/
+-- <probe_name>/
|-- xyz_picks.json # Probe track picks (CCF coordinates)
|-- xyz_picks_image_space.json # Probe track picks (image space)
|-- xyz_picks_shank<N>.json # Per-shank picks (multi-shank only)
+-- spikes/ # Ephys ALF output (if not skipped)
The datapackage.json file contains a structured manifest of all outputs,
including transform chain paths, histology volume paths, and per-probe
metadata. It can be loaded back with:
from aind_ibl_ephys_alignment_preprocessing.manifest import load_datapackage
dp = load_datapackage("/path/to/datapackage.json")
Pipeline overview
The pipeline performs the following steps:
- Asset discovery -- Parse the Neuroglancer JSON to locate the SmartSPIM OME-Zarr volumes and ANTs registration outputs.
- Validation -- Check that all required inputs exist and are correctly structured.
- Histology processing -- Reorient the registration channel to CCF space, export additional channels, and compute inverse transforms (CCF template and labels warped back to image space).
- Probe processing -- For each manifest row, load the Neuroglancer point
annotations and convert them through the transform chain (SPIM -> template
-> CCF -> bregma), writing coordinate files at each stage and producing
xyz_picks.jsonfiles for the alignment GUI. - Ephys extraction (optional) -- Convert spike-sorted data into IBL ALF format.
- Manifest generation -- Write
datapackage.jsonsummarizing all outputs.
Adapting to your own data
If you are not using the AIND SmartSPIM pipeline but want to use this package with your own histology and ephys data, the key requirements are:
- OME-Zarr volumes -- Your fused histology images must be in OME-Zarr format, discoverable via a Neuroglancer JSON state file.
- ANTs registration outputs -- You need ANTs SyN registration transforms
mapping your histology images to a common template and from that template to
the Allen CCF. The pipeline expects standard ANTs output files
(
*_0GenericAffine.mat,*_1InverseWarp.nii.gz). - Reference volumes -- The Allen CCF template, lateralized annotation
labels, and the SmartSPIM LCA template at 25 um resolution. These can be
overridden in
PipelineConfigif your paths differ from the defaults. - Probe annotations -- Neuroglancer point annotation JSON files marking probe tracks in histology image space.
- Spike-sorted data (optional) -- If you want ephys extraction, provide spike-sorted output compatible with aind-ephys-ibl-gui-conversion.
The coordinate convention used throughout is the IBL convention: x = ML (right positive), y = AP (anterior positive), z = DV (dorsal positive), with origin at bregma, in meters.
Development
To develop the code, run:
uv sync
Please test your changes using the full linting and testing suite:
./scripts/run_linters_and_checks.sh -c
Or run individual commands:
uv run --frozen ruff format # Code formatting
uv run --frozen ruff check # Linting
uv run --frozen mypy # Type checking
uv run --frozen interrogate -v src # Documentation coverage
uv run --frozen codespell --check-filenames # Spell checking
uv run --frozen pytest --cov aind_ibl_ephys_alignment_preprocessing # Tests with coverage
Documentation
sphinx-build -b html docs/source/ docs/build/html
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aind_ibl_ephys_alignment_preprocessing-0.2.0.tar.gz.
File metadata
- Download URL: aind_ibl_ephys_alignment_preprocessing-0.2.0.tar.gz
- Upload date:
- Size: 309.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8bf4ed9c38d1140e1d19c6f3cb36628016fd98d853f70f539012bf09c80e4e4
|
|
| MD5 |
9dff0776ded8ad32f829ecc8f284cef8
|
|
| BLAKE2b-256 |
ade3c419d5cade50d032f6166c87d181a286acdf817d8dd70e13b9fea1788488
|
File details
Details for the file aind_ibl_ephys_alignment_preprocessing-0.2.0-py3-none-any.whl.
File metadata
- Download URL: aind_ibl_ephys_alignment_preprocessing-0.2.0-py3-none-any.whl
- Upload date:
- Size: 45.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee6fd2d363bdbbc09cbde02a11cee6cf2bd8ab1c4a5b45375b038f43aa549c9c
|
|
| MD5 |
9bc09c98b0bce8ef83b8b0b0f405b1e1
|
|
| BLAKE2b-256 |
45f83497bd3e343fa6593ffda282a199fa3c1e763a479e5fed5d59a725150efa
|