Skip to main content

An attempt to speed-up access to large NWB (Neurodata Without Borders) files stored in the cloud.

Project description

lazynwb

PyPI Python version

Coverage CI/CD GitHub issues

Purpose

1. Make NWB table access faster and/or consume less memory by reading only the data required, when it's needed

As of 2025 and pynwb==3.0, there are a couple of ways to access data stored in an NWB file as a DynamicTable (e.g. trials, units):

  • get the pandas dataframe for the table and access the desired column
  • or access specific columns as arrays from disk

The NWB schema for the units table includes columns for list-like or nested data, such as spike_times, waveform_mean, waveform_sd, which can become large for Neuropixels probes and often not needed in their entirety for analysis. Reading the entire table into memory may be unnecessary and, especially when reading from NWBs stored in the cloud, can be slow.

Accessing individual columns as arrays, on the other hand, means we no longer have the convenience of a DataFrame.

Ideally, we would filter our table based on metrics in some columns, then access the larger columns for the filtered subset of rows, seamlessly with a single command.

To this end, lazynwb.scan_nwb() provides a polars.LazyFrame() interface to NWB tables, which supports both predicate pushdown and projection of columns.

It also supports reading multiple NWB files in one operation, producing a concatenated table:

>>> import lazynwb
>>> import polars as pl

>>> (
  lazynwb.scan_nwb(
    [nwb_path_0, nwb_path_1, ...],  # single path or iterable
    table_path='/units',             # or '/intervals/trials' etc
  )
  .filter(
    pl.col('activity_drift') <= 0.2,
    pl.col('amplitude_cutoff') <= 0.1,
    pl.col('presence_ratio') >= 0.7,
    pl.col('isi_violations_ratio') <= 0.5,
    pl.col('decoder_label') != 'noise',
  )
  .select('unit_id', 'location', 'spike_times', '_nwb_path', '_table_row_index')
  # _nwb_path and _table_row_index are not columns in the NWB table: they're added to identify source of each row in a table that spans multiple NWBs
)
shape: (101, 4)
┌─────────┬─────────────────────────────────┬─────────────────────────────────┬──────────────┐
 unit_id  spike_times                      _nwb_path                        _table_index 
 ---      ---                              ---                              ---          
 i64      list[f64]                        str                              u32          
╞═════════╪═════════════════════════════════╪═════════════════════════════════╪══════════════╡
 193      [2722.628735, 2723.620493,  4  /data/ecephys_702960_2024-03-1  5            
 23       [1784.801304, 1784.804037,  3  /data/ecephys_725805_2024-07-1  4            
 0        [9.2712e6, 9.2712e6,  9.2731e  /data/ecephys_737812_2024-08-0  0            
 300      [9.2713e6, 9.2714e6,  9.2731e  /data/ecephys_737812_2024-08-0  6            
 19       [6115.424355, 6116.428649,  7  /data/ecephys_702960_2024-03-1  5            
                                                                                     
 437      [581.476385, 598.829113,  331  /data/ecephys_666859_2023-06-1  40           
 439      [929.656482, 1134.993272,  33  /data/ecephys_666859_2023-06-1  41           
 446      [626.940861, 661.785209,  331  /data/ecephys_666859_2023-06-1  42           
 449      [618.939192, 618.991564,  331  /data/ecephys_666859_2023-06-1  43           
 609      [594.415999, 646.51812,  3312  /data/ecephys_666859_2023-06-1  44           
└─────────┴─────────────────────────────────┴─────────────────────────────────┴──────────────┘

2. Quickly provide a summary of the metadata for all NWB files in a project

>>> lazynwb.get_metadata_df(nwb_paths, as_polars=True)
```Getting metadata: 100%|█████████████████████| 252/252 [00:17<00:00, 14.51file/s]
shape: (252, 28)
┌────────────┬────────────┬───────────┬───────────┬───┬────────┬───────────┬───────────┬───────────┐
 identifier  session_st  session_i  session_d    weight  strain     date_of_b  _nwb_path 
 ---         art_time    d          escriptio     ---     ---        irth       ---       
 str         ---         ---        n             null    str        ---        str       
             datetime[μ  str        ---                              datetime[            
             s, UTC]                str                              μs, UTC]             
╞════════════╪════════════╪═══════════╪═══════════╪═══╪════════╪═══════════╪═══════════╪═══════════╡
 0514cf12-2  2024-08-07  713655_20  ecephys      null    Sst-IRES-  2023-11-2  /data/dyn 
 41f-4ab2-a  19:03:44    24-08-07   session               Cre;Ai32   3          amicrouti 
 ce9-1c2619  UTC                    (day 3)                          08:00:00   ng_datacu 
                                   with b                          UTC        be_      
 5c032dff-e  2024-12-06  743199_20  ecephys      null    VGAT-ChR2  2024-05-1  /data/dyn 
 04f-4884-9  19:06:17    24-12-06   session               -YFP       8          amicrouti 
 85d-055ac7  UTC                    (day 4)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 4a7e9fdb-4  2022-09-27  636397_20  ecephys      null    C57BL6J(N  2022-06-0  /data/dyn 
 fab-4052-a  18:36:50    22-09-27   session               P)         2          amicrouti 
 7fc-f2d109  UTC                    (day 2)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 9b4aab77-5  2025-01-16  744279_20  ecephys      null    Sst-IRES-  2024-05-2  /data/dyn 
 021-43f3-9  22:01:37    25-01-16   session               Cre;Ai32   5          amicrouti 
 f18-b13291  UTC                    (day 4)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 b0ba34cb-4  2024-04-22  706401_20  ecephys      null    Sst-IRES-  2023-10-0  /data/dyn 
...
 971-495d-b  19:18:59    25-03-18   session               -YFP       6          amicrouti 
 6ed-dc7b08  UTC                    (day 1)                          07:00:00   ng_datacu 
                                   withou                          UTC        be_      
└────────────┴────────────┴───────────┴───────────┴───┴────────┴───────────┴───────────┴───────────┘

3. Quickly provide a summary of the contents of a single NWB file

>>> lazynwb.get_internal_paths(nwb_paths[0])
{
  '/acquisition/frametimes_eye_camera/timestamps': <HDF5 dataset "timestamps": shape (267399,), type "<f8">,
  '/acquisition/frametimes_front_camera/timestamps': <HDF5 dataset "timestamps": shape (267204,), type "<f8">,
  '/acquisition/frametimes_side_camera/timestamps': <HDF5 dataset "timestamps": shape (267374,), type "<f8">,
  '/acquisition/lick_sensor_events/data': <HDF5 dataset "data": shape (2734,), type "<f8">,
  '/acquisition/lick_sensor_events/timestamps': <HDF5 dataset "timestamps": shape (2734,), type "<f8">,
  '/intervals/aud_rf_mapping_trials': <HDF5 group "/intervals/aud_rf_mapping_trials" (10 members)>,
  '/intervals/epochs': <HDF5 group "/intervals/epochs" (9 members)>,
  '/intervals/performance': <HDF5 group "/intervals/performance" (21 members)>,
  '/intervals/trials': <HDF5 group "/intervals/trials" (48 members)>,
  '/intervals/vis_rf_mapping_trials': <HDF5 group "/intervals/vis_rf_mapping_trials" (12 members)>,
  '/processing/behavior/dlc_eye_camera': <HDF5 group "/processing/behavior/dlc_eye_camera" (110 members)>,
  '/processing/behavior/eye_tracking': <HDF5 group "/processing/behavior/eye_tracking" (26 members)>,
  '/processing/behavior/facemap_front_camera/data': <HDF5 dataset "data": shape (267204, 500), type "<f4">,
  '/processing/behavior/facemap_front_camera/timestamps': <HDF5 dataset "timestamps": shape (267204,), type "<f8">,
  '/processing/behavior/facemap_side_camera/data': <HDF5 dataset "data": shape (267374, 500), type "<f4">,
  '/processing/behavior/facemap_side_camera/timestamps': <HDF5 dataset "timestamps": shape (267374,), type "<f8">,
  '/processing/behavior/licks/data': <HDF5 dataset "data": shape (2707,), type "<f8">,
  '/processing/behavior/licks/timestamps': <HDF5 dataset "timestamps": shape (2707,), type "<f8">,
  '/processing/behavior/lp_front_camera': <HDF5 group "/processing/behavior/lp_front_camera" (57 members)>,
  '/processing/behavior/lp_side_camera': <HDF5 group "/processing/behavior/lp_side_camera" (57 members)>,
  '/processing/behavior/quiescent_interval_violations/timestamps': <HDF5 dataset "timestamps": shape (131,), type "<f8">,
  '/processing/behavior/rewards/timestamps': <HDF5 dataset "timestamps": shape (130,), type "<f8">,
  '/processing/behavior/running_speed/data': <HDF5 dataset "data": shape (251998,), type "<f8">,
  '/processing/behavior/running_speed/timestamps': <HDF5 dataset "timestamps": shape (251998,), type "<f8">
 }

4. Get the common schema for a table in one or more NWB files

>>> lazynwb.get_table_schema(nwb_paths, table_path="/intervals/trials")
# uses polars (arrow) datatypes
OrderedDict([('condition', String), ('id', Int64), ('start_time', Float64), ('stop_time', Float64), ('_nwb_path', String), ('_table_path', String), ('_table_index', UInt32)])

Development

See instructions in https://github.com/bjhardcastle/lazynwb/CONTRIBUTING.md and the original template: https://github.com/bjhardcastle/copier-pdm-npc/blob/main/README.md

notes

  • hdf5 access seems to have a mutex lock that threads spend a long time waiting to acquire (with remfile)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazynwb-0.2.80.tar.gz (57.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazynwb-0.2.80-py3-none-any.whl (47.6 kB view details)

Uploaded Python 3

File details

Details for the file lazynwb-0.2.80.tar.gz.

File metadata

  • Download URL: lazynwb-0.2.80.tar.gz
  • Upload date:
  • Size: 57.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lazynwb-0.2.80.tar.gz
Algorithm Hash digest
SHA256 385822983adeac206f5e1fecaffbff48f80ec267570db6b4c1769d8770e344ac
MD5 d0e0d6e477a24f5a3f24c21600ed82c7
BLAKE2b-256 1d1e7b57515cafcb14889c172ef4bb6a20eef4f60a168a5f9163552c69493d7f

See more details on using hashes here.

File details

Details for the file lazynwb-0.2.80-py3-none-any.whl.

File metadata

  • Download URL: lazynwb-0.2.80-py3-none-any.whl
  • Upload date:
  • Size: 47.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lazynwb-0.2.80-py3-none-any.whl
Algorithm Hash digest
SHA256 d5a7aa77d3451c6641dd43b9510c0bee604a60af3350283af0f6df0d7257c8b9
MD5 665cbdda6e734260e6fc74e3259d1400
BLAKE2b-256 9c72366df63809d60e88f688db3e57de5a230aeeeb8b5bd418b3bf4332acd02a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page