Skip to main content

An attempt to speed-up access to large NWB (Neurodata Without Borders) files stored in the cloud.

Project description

lazynwb

PyPI Python version

Coverage CI/CD GitHub issues

Purpose

1. Make NWB table access faster and/or consume less memory by reading only the data required, when it's needed

As of 2025 and pynwb==3.0, there are a couple of ways to access data stored in an NWB file as a DynamicTable (e.g. trials, units):

  • get the pandas dataframe for the table and access the desired column
  • or access specific columns as arrays from disk

The schema for the units table includes columns for list or nested-list type data, including spike_times, and waveform_mean and waveform_sd which can be large for Neuropixels probes and often not needed for analysis. Reading the entire table into memory may be unnecessary and, especially when reading from NWBs stored in the cloud, can be slow.

Accessing individual columns as arrays, on the other hand, means we no longer have the convenience of a DataFrame.

Ideally, we would filter our table based on metrics in some columns, then access the larger columns for the filtered subset of rows, seemlessly with a single command.

To this end, lazynwb.scan_nwb() provides a polars.LazyFrame() interface to NWB tables, which supports both predicate pushdown and projection of columns.

It also supports reading multiple NWB files in one operation, producing a concatenated table:

>>> import lazynwb
>>> import polars as pl

>>> (
  lazynwb.scan_nwb(
    [nwb_path_0, nwb_path_1, ...],  # single path or iterable
    table_path='/units',             # or '/intervals/trials' etc
  )
  .filter(
    pl.col('activity_drift') <= 0.2,
    pl.col('amplitude_cutoff') <= 0.1,
    pl.col('presence_ratio') >= 0.7,
    pl.col('isi_violations_ratio') <= 0.5,
    pl.col('decoder_label') != 'noise',
  )
  .select('unit_id', 'location', 'spike_times', '_nwb_path', '_table_row_index')
  # _nwb_path and _table_row_index are not columns in the NWB table: they're added to identify source of each row in a table that spans multiple NWBs
)
shape: (101, 4)
┌─────────┬─────────────────────────────────┬─────────────────────────────────┬──────────────┐
 unit_id  spike_times                      _nwb_path                        _table_index 
 ---      ---                              ---                              ---          
 i64      list[f64]                        str                              u32          
╞═════════╪═════════════════════════════════╪═════════════════════════════════╪══════════════╡
 193      [2722.628735, 2723.620493,  4  /data/ecephys_702960_2024-03-1  5            
 23       [1784.801304, 1784.804037,  3  /data/ecephys_725805_2024-07-1  4            
 0        [9.2712e6, 9.2712e6,  9.2731e  /data/ecephys_737812_2024-08-0  0            
 300      [9.2713e6, 9.2714e6,  9.2731e  /data/ecephys_737812_2024-08-0  6            
 19       [6115.424355, 6116.428649,  7  /data/ecephys_702960_2024-03-1  5            
                                                                                     
 437      [581.476385, 598.829113,  331  /data/ecephys_666859_2023-06-1  40           
 439      [929.656482, 1134.993272,  33  /data/ecephys_666859_2023-06-1  41           
 446      [626.940861, 661.785209,  331  /data/ecephys_666859_2023-06-1  42           
 449      [618.939192, 618.991564,  331  /data/ecephys_666859_2023-06-1  43           
 609      [594.415999, 646.51812,  3312  /data/ecephys_666859_2023-06-1  44           
└─────────┴─────────────────────────────────┴─────────────────────────────────┴──────────────┘

2. Quickly provide a summary of the metadata for all NWB files in a project

>>> lazynwb.get_metadata_df(nwb_paths, as_polars=True)
```Getting metadata: 100%|█████████████████████| 252/252 [00:17<00:00, 14.51file/s]
shape: (252, 28)
┌────────────┬────────────┬───────────┬───────────┬───┬────────┬───────────┬───────────┬───────────┐
 identifier  session_st  session_i  session_d    weight  strain     date_of_b  _nwb_path 
 ---         art_time    d          escriptio     ---     ---        irth       ---       
 str         ---         ---        n             null    str        ---        str       
             datetime[μ  str        ---                              datetime[            
             s, UTC]                str                              μs, UTC]             
╞════════════╪════════════╪═══════════╪═══════════╪═══╪════════╪═══════════╪═══════════╪═══════════╡
 0514cf12-2  2024-08-07  713655_20  ecephys      null    Sst-IRES-  2023-11-2  /data/dyn 
 41f-4ab2-a  19:03:44    24-08-07   session               Cre;Ai32   3          amicrouti 
 ce9-1c2619  UTC                    (day 3)                          08:00:00   ng_datacu 
                                   with b                          UTC        be_      
 5c032dff-e  2024-12-06  743199_20  ecephys      null    VGAT-ChR2  2024-05-1  /data/dyn 
 04f-4884-9  19:06:17    24-12-06   session               -YFP       8          amicrouti 
 85d-055ac7  UTC                    (day 4)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 4a7e9fdb-4  2022-09-27  636397_20  ecephys      null    C57BL6J(N  2022-06-0  /data/dyn 
 fab-4052-a  18:36:50    22-09-27   session               P)         2          amicrouti 
 7fc-f2d109  UTC                    (day 2)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 9b4aab77-5  2025-01-16  744279_20  ecephys      null    Sst-IRES-  2024-05-2  /data/dyn 
 021-43f3-9  22:01:37    25-01-16   session               Cre;Ai32   5          amicrouti 
 f18-b13291  UTC                    (day 4)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 b0ba34cb-4  2024-04-22  706401_20  ecephys      null    Sst-IRES-  2023-10-0  /data/dyn 
...
 971-495d-b  19:18:59    25-03-18   session               -YFP       6          amicrouti 
 6ed-dc7b08  UTC                    (day 1)                          07:00:00   ng_datacu 
                                   withou                          UTC        be_      
└────────────┴────────────┴───────────┴───────────┴───┴────────┴───────────┴───────────┴───────────┘

3. Quickly provide a summary of the contents of a single NWB file

>>> lazynwb.get_internal_paths(nwb_paths[0])
{
  '/acquisition/frametimes_eye_camera/timestamps': <HDF5 dataset "timestamps": shape (267399,), type "<f8">,
  '/acquisition/frametimes_front_camera/timestamps': <HDF5 dataset "timestamps": shape (267204,), type "<f8">,
  '/acquisition/frametimes_side_camera/timestamps': <HDF5 dataset "timestamps": shape (267374,), type "<f8">,
  '/acquisition/lick_sensor_events/data': <HDF5 dataset "data": shape (2734,), type "<f8">,
  '/acquisition/lick_sensor_events/timestamps': <HDF5 dataset "timestamps": shape (2734,), type "<f8">,
  '/intervals/aud_rf_mapping_trials': <HDF5 group "/intervals/aud_rf_mapping_trials" (10 members)>,
  '/intervals/epochs': <HDF5 group "/intervals/epochs" (9 members)>,
  '/intervals/performance': <HDF5 group "/intervals/performance" (21 members)>,
  '/intervals/trials': <HDF5 group "/intervals/trials" (48 members)>,
  '/intervals/vis_rf_mapping_trials': <HDF5 group "/intervals/vis_rf_mapping_trials" (12 members)>,
  '/processing/behavior/dlc_eye_camera': <HDF5 group "/processing/behavior/dlc_eye_camera" (110 members)>,
  '/processing/behavior/eye_tracking': <HDF5 group "/processing/behavior/eye_tracking" (26 members)>,
  '/processing/behavior/facemap_front_camera/data': <HDF5 dataset "data": shape (267204, 500), type "<f4">,
  '/processing/behavior/facemap_front_camera/timestamps': <HDF5 dataset "timestamps": shape (267204,), type "<f8">,
  '/processing/behavior/facemap_side_camera/data': <HDF5 dataset "data": shape (267374, 500), type "<f4">,
  '/processing/behavior/facemap_side_camera/timestamps': <HDF5 dataset "timestamps": shape (267374,), type "<f8">,
  '/processing/behavior/licks/data': <HDF5 dataset "data": shape (2707,), type "<f8">,
  '/processing/behavior/licks/timestamps': <HDF5 dataset "timestamps": shape (2707,), type "<f8">,
  '/processing/behavior/lp_front_camera': <HDF5 group "/processing/behavior/lp_front_camera" (57 members)>,
  '/processing/behavior/lp_side_camera': <HDF5 group "/processing/behavior/lp_side_camera" (57 members)>,
  '/processing/behavior/quiescent_interval_violations/timestamps': <HDF5 dataset "timestamps": shape (131,), type "<f8">,
  '/processing/behavior/rewards/timestamps': <HDF5 dataset "timestamps": shape (130,), type "<f8">,
  '/processing/behavior/running_speed/data': <HDF5 dataset "data": shape (251998,), type "<f8">,
  '/processing/behavior/running_speed/timestamps': <HDF5 dataset "timestamps": shape (251998,), type "<f8">
 }

4. Get the common schema for a table in one or more NWB files

>>> lazynwb.get_table_schema(nwb_paths, table_path="/intervals/trials")
# uses polars (arrow) datatypes
OrderedDict([('condition', String), ('id', Int64), ('start_time', Float64), ('stop_time', Float64), ('_nwb_path', String), ('_table_path', String), ('_table_index', UInt32)])

Development

See instructions in https://github.com/bjhardcastle/lazynwb/CONTRIBUTING.md and the original template: https://github.com/bjhardcastle/copier-pdm-npc/blob/main/README.md

notes

  • hdf5 access seems to have a mutex lock that threads spend a long time waiting to acquire (with remfile)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazynwb-0.2.69.tar.gz (47.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazynwb-0.2.69-py3-none-any.whl (39.6 kB view details)

Uploaded Python 3

File details

Details for the file lazynwb-0.2.69.tar.gz.

File metadata

  • Download URL: lazynwb-0.2.69.tar.gz
  • Upload date:
  • Size: 47.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.5

File hashes

Hashes for lazynwb-0.2.69.tar.gz
Algorithm Hash digest
SHA256 1d48261b2a2f4618ae41a9660f778f13cd11cbd30db0e50e883a434e5a99890b
MD5 c54c228611b097cc3a3d3dcf54c6c070
BLAKE2b-256 5f73aabe1595dbb7e8ba71ac9ba15cd3c9c4e1f40697bbf9de38de25e3e80ffd

See more details on using hashes here.

File details

Details for the file lazynwb-0.2.69-py3-none-any.whl.

File metadata

  • Download URL: lazynwb-0.2.69-py3-none-any.whl
  • Upload date:
  • Size: 39.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.5

File hashes

Hashes for lazynwb-0.2.69-py3-none-any.whl
Algorithm Hash digest
SHA256 ec3bcd6e1b99dcb8930a75ef6fcee42ef1c7f9cf6b4a968e93b48c3a499e1f34
MD5 a49a6e10b28c395a223b1d3348429c82
BLAKE2b-256 90b3398cfa27c0c68d645c234428d9d32d207e66d70e86a4c4690aac18c29ada

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page