Skip to main content

An attempt to speed-up access to large NWB (Neurodata Without Borders) files stored in the cloud.

Project description

lazynwb

PyPI Python version

Coverage CI/CD GitHub issues

Purpose

1. Make NWB table access faster and/or consume less memory by reading only the data required, when it's needed

As of 2025 and pynwb==3.0, there are a couple of ways to access data stored in an NWB file as a DynamicTable (e.g. trials, units):

  • get the pandas dataframe for the table and access the desired column
  • or access specific columns as arrays from disk

The schema for the units table includes columns for list or nested-list type data, including spike_times, and waveform_mean and waveform_sd which can be large for Neuropixels probes and often not needed for analysis. Reading the entire table into memory may be unnecessary and, especially when reading from NWBs stored in the cloud, can be slow.

Accessing individual columns as arrays, on the other hand, means we no longer have the convenience of a DataFrame.

Ideally, we would filter our table based on metrics in some columns, then access the larger columns for the filtered subset of rows, seemlessly with a single command.

To this end, lazynwb.scan_nwb() provides a polars.LazyFrame() interface to NWB tables, which supports both predicate pushdown and projection of columns.

It also supports reading multiple NWB files in one operation, producing a concatenated table:

>>> import lazynwb
>>> import polars as pl

>>> (
  lazynwb.scan_nwb(
    [nwb_path_0, nwb_path_1, ...],  # single path or iterable
    table_path='/units',             # or '/intervals/trials' etc
  )
  .filter(
    pl.col('activity_drift') <= 0.2,
    pl.col('amplitude_cutoff') <= 0.1,
    pl.col('presence_ratio') >= 0.7,
    pl.col('isi_violations_ratio') <= 0.5,
    pl.col('decoder_label') != 'noise',
  )
  .select('unit_id', 'location', 'spike_times', '_nwb_path', '_table_row_index')
  # _nwb_path and _table_row_index are not columns in the NWB table: they're added to identify source of each row in a table that spans multiple NWBs
)
shape: (101, 4)
┌─────────┬─────────────────────────────────┬─────────────────────────────────┬──────────────┐
 unit_id  spike_times                      _nwb_path                        _table_index 
 ---      ---                              ---                              ---          
 i64      list[f64]                        str                              u32          
╞═════════╪═════════════════════════════════╪═════════════════════════════════╪══════════════╡
 193      [2722.628735, 2723.620493,  4  /data/ecephys_702960_2024-03-1  5            
 23       [1784.801304, 1784.804037,  3  /data/ecephys_725805_2024-07-1  4            
 0        [9.2712e6, 9.2712e6,  9.2731e  /data/ecephys_737812_2024-08-0  0            
 300      [9.2713e6, 9.2714e6,  9.2731e  /data/ecephys_737812_2024-08-0  6            
 19       [6115.424355, 6116.428649,  7  /data/ecephys_702960_2024-03-1  5            
                                                                                     
 437      [581.476385, 598.829113,  331  /data/ecephys_666859_2023-06-1  40           
 439      [929.656482, 1134.993272,  33  /data/ecephys_666859_2023-06-1  41           
 446      [626.940861, 661.785209,  331  /data/ecephys_666859_2023-06-1  42           
 449      [618.939192, 618.991564,  331  /data/ecephys_666859_2023-06-1  43           
 609      [594.415999, 646.51812,  3312  /data/ecephys_666859_2023-06-1  44           
└─────────┴─────────────────────────────────┴─────────────────────────────────┴──────────────┘

2. Quickly provide a summary of the metadata for all NWB files in a project

>>> lazynwb.get_metadata_df(nwb_paths, as_polars=True)
```Getting metadata: 100%|█████████████████████| 252/252 [00:17<00:00, 14.51file/s]
shape: (252, 28)
┌────────────┬────────────┬───────────┬───────────┬───┬────────┬───────────┬───────────┬───────────┐
 identifier  session_st  session_i  session_d    weight  strain     date_of_b  _nwb_path 
 ---         art_time    d          escriptio     ---     ---        irth       ---       
 str         ---         ---        n             null    str        ---        str       
             datetime[μ  str        ---                              datetime[            
             s, UTC]                str                              μs, UTC]             
╞════════════╪════════════╪═══════════╪═══════════╪═══╪════════╪═══════════╪═══════════╪═══════════╡
 0514cf12-2  2024-08-07  713655_20  ecephys      null    Sst-IRES-  2023-11-2  /data/dyn 
 41f-4ab2-a  19:03:44    24-08-07   session               Cre;Ai32   3          amicrouti 
 ce9-1c2619  UTC                    (day 3)                          08:00:00   ng_datacu 
                                   with b                          UTC        be_      
 5c032dff-e  2024-12-06  743199_20  ecephys      null    VGAT-ChR2  2024-05-1  /data/dyn 
 04f-4884-9  19:06:17    24-12-06   session               -YFP       8          amicrouti 
 85d-055ac7  UTC                    (day 4)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 4a7e9fdb-4  2022-09-27  636397_20  ecephys      null    C57BL6J(N  2022-06-0  /data/dyn 
 fab-4052-a  18:36:50    22-09-27   session               P)         2          amicrouti 
 7fc-f2d109  UTC                    (day 2)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 9b4aab77-5  2025-01-16  744279_20  ecephys      null    Sst-IRES-  2024-05-2  /data/dyn 
 021-43f3-9  22:01:37    25-01-16   session               Cre;Ai32   5          amicrouti 
 f18-b13291  UTC                    (day 4)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 b0ba34cb-4  2024-04-22  706401_20  ecephys      null    Sst-IRES-  2023-10-0  /data/dyn 
...
 971-495d-b  19:18:59    25-03-18   session               -YFP       6          amicrouti 
 6ed-dc7b08  UTC                    (day 1)                          07:00:00   ng_datacu 
                                   withou                          UTC        be_      
└────────────┴────────────┴───────────┴───────────┴───┴────────┴───────────┴───────────┴───────────┘

3. Quickly provide a summary of the contents of a single NWB file

>>> lazynwb.get_internal_paths(nwb_paths[0])
{
  '/acquisition/frametimes_eye_camera/timestamps': <HDF5 dataset "timestamps": shape (267399,), type "<f8">,
  '/acquisition/frametimes_front_camera/timestamps': <HDF5 dataset "timestamps": shape (267204,), type "<f8">,
  '/acquisition/frametimes_side_camera/timestamps': <HDF5 dataset "timestamps": shape (267374,), type "<f8">,
  '/acquisition/lick_sensor_events/data': <HDF5 dataset "data": shape (2734,), type "<f8">,
  '/acquisition/lick_sensor_events/timestamps': <HDF5 dataset "timestamps": shape (2734,), type "<f8">,
  '/intervals/aud_rf_mapping_trials': <HDF5 group "/intervals/aud_rf_mapping_trials" (10 members)>,
  '/intervals/epochs': <HDF5 group "/intervals/epochs" (9 members)>,
  '/intervals/performance': <HDF5 group "/intervals/performance" (21 members)>,
  '/intervals/trials': <HDF5 group "/intervals/trials" (48 members)>,
  '/intervals/vis_rf_mapping_trials': <HDF5 group "/intervals/vis_rf_mapping_trials" (12 members)>,
  '/processing/behavior/dlc_eye_camera': <HDF5 group "/processing/behavior/dlc_eye_camera" (110 members)>,
  '/processing/behavior/eye_tracking': <HDF5 group "/processing/behavior/eye_tracking" (26 members)>,
  '/processing/behavior/facemap_front_camera/data': <HDF5 dataset "data": shape (267204, 500), type "<f4">,
  '/processing/behavior/facemap_front_camera/timestamps': <HDF5 dataset "timestamps": shape (267204,), type "<f8">,
  '/processing/behavior/facemap_side_camera/data': <HDF5 dataset "data": shape (267374, 500), type "<f4">,
  '/processing/behavior/facemap_side_camera/timestamps': <HDF5 dataset "timestamps": shape (267374,), type "<f8">,
  '/processing/behavior/licks/data': <HDF5 dataset "data": shape (2707,), type "<f8">,
  '/processing/behavior/licks/timestamps': <HDF5 dataset "timestamps": shape (2707,), type "<f8">,
  '/processing/behavior/lp_front_camera': <HDF5 group "/processing/behavior/lp_front_camera" (57 members)>,
  '/processing/behavior/lp_side_camera': <HDF5 group "/processing/behavior/lp_side_camera" (57 members)>,
  '/processing/behavior/quiescent_interval_violations/timestamps': <HDF5 dataset "timestamps": shape (131,), type "<f8">,
  '/processing/behavior/rewards/timestamps': <HDF5 dataset "timestamps": shape (130,), type "<f8">,
  '/processing/behavior/running_speed/data': <HDF5 dataset "data": shape (251998,), type "<f8">,
  '/processing/behavior/running_speed/timestamps': <HDF5 dataset "timestamps": shape (251998,), type "<f8">
 }

4. Get the common schema for a table in one or more NWB files

>>> lazynwb.get_table_schema(nwb_paths, table_path="/intervals/trials")
# uses polars (arrow) datatypes
OrderedDict([('condition', String), ('id', Int64), ('start_time', Float64), ('stop_time', Float64), ('_nwb_path', String), ('_table_path', String), ('_table_index', UInt32)])

Development

See instructions in https://github.com/bjhardcastle/lazynwb/CONTRIBUTING.md and the original template: https://github.com/bjhardcastle/copier-pdm-npc/blob/main/README.md

notes

  • hdf5 access seems to have a mutex lock that threads spend a long time waiting to acquire (with remfile)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazynwb-0.2.73.tar.gz (47.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazynwb-0.2.73-py3-none-any.whl (39.9 kB view details)

Uploaded Python 3

File details

Details for the file lazynwb-0.2.73.tar.gz.

File metadata

  • Download URL: lazynwb-0.2.73.tar.gz
  • Upload date:
  • Size: 47.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.11

File hashes

Hashes for lazynwb-0.2.73.tar.gz
Algorithm Hash digest
SHA256 3f0a4381f7a3f4af77cce084ae3810196760d97a3b039243a31d76e4630fba35
MD5 87adbda3d9b4adaf2ab08159ce0458e9
BLAKE2b-256 6d7b4b354d61e08058541adc4cdc072ceb3f527d41423ffc2432ea35679e761a

See more details on using hashes here.

File details

Details for the file lazynwb-0.2.73-py3-none-any.whl.

File metadata

  • Download URL: lazynwb-0.2.73-py3-none-any.whl
  • Upload date:
  • Size: 39.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.11

File hashes

Hashes for lazynwb-0.2.73-py3-none-any.whl
Algorithm Hash digest
SHA256 2ab59f0324d0d4264aba247d1f9a517fd7cc33aa826ba3882a8f7b3499931a6f
MD5 7da529ffbc596703397c76ce5cac95a6
BLAKE2b-256 65c679b72b5ae13e0c27642049f38ba4a4a8ff60992bbf7b189aa0ee497987b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page