Skip to main content

An attempt to speed-up access to large NWB (Neurodata Without Borders) files stored in the cloud.

Project description

lazynwb

PyPI Python version

Coverage CI/CD GitHub issues

Purpose

1. Make NWB table access faster and/or consume less memory by reading only the data required, when it's needed

As of 2025 and pynwb==3.0, there are a couple of ways to access data stored in an NWB file as a DynamicTable (e.g. trials, units):

  • get the pandas dataframe for the table and access the desired column
  • or access specific columns as arrays from disk

The schema for the units table includes columns for list or nested-list type data, including spike_times, and waveform_mean and waveform_sd which can be large for Neuropixels probes and often not needed for analysis. Reading the entire table into memory may be unnecessary and, especially when reading from NWBs stored in the cloud, can be slow.

Accessing individual columns as arrays, on the other hand, means we no longer have the convenience of a DataFrame.

Ideally, we would filter our table based on metrics in some columns, then access the larger columns for the filtered subset of rows, seemlessly with a single command.

To this end, lazynwb.scan_nwb() provides a polars.LazyFrame() interface to NWB tables, which supports both predicate pushdown and projection of columns.

It also supports reading multiple NWB files in one operation, producing a concatenated table:

>>> import lazynwb
>>> import polars as pl

>>> (
  lazynwb.scan_nwb(
    [nwb_path_0, nwb_path_1, ...],  # single path or iterable
    table_path='/units',             # or '/intervals/trials' etc
  )
  .filter(
    pl.col('activity_drift') <= 0.2,
    pl.col('amplitude_cutoff') <= 0.1,
    pl.col('presence_ratio') >= 0.7,
    pl.col('isi_violations_ratio') <= 0.5,
    pl.col('decoder_label') != 'noise',
  )
  .select('unit_id', 'location', 'spike_times', '_nwb_path', '_table_row_index')
  # _nwb_path and _table_row_index are not columns in the NWB table: they're added to identify source of each row in a table that spans multiple NWBs
)
shape: (101, 4)
┌─────────┬─────────────────────────────────┬─────────────────────────────────┬──────────────┐
 unit_id  spike_times                      _nwb_path                        _table_index 
 ---      ---                              ---                              ---          
 i64      list[f64]                        str                              u32          
╞═════════╪═════════════════════════════════╪═════════════════════════════════╪══════════════╡
 193      [2722.628735, 2723.620493,  4  /data/ecephys_702960_2024-03-1  5            
 23       [1784.801304, 1784.804037,  3  /data/ecephys_725805_2024-07-1  4            
 0        [9.2712e6, 9.2712e6,  9.2731e  /data/ecephys_737812_2024-08-0  0            
 300      [9.2713e6, 9.2714e6,  9.2731e  /data/ecephys_737812_2024-08-0  6            
 19       [6115.424355, 6116.428649,  7  /data/ecephys_702960_2024-03-1  5            
                                                                                     
 437      [581.476385, 598.829113,  331  /data/ecephys_666859_2023-06-1  40           
 439      [929.656482, 1134.993272,  33  /data/ecephys_666859_2023-06-1  41           
 446      [626.940861, 661.785209,  331  /data/ecephys_666859_2023-06-1  42           
 449      [618.939192, 618.991564,  331  /data/ecephys_666859_2023-06-1  43           
 609      [594.415999, 646.51812,  3312  /data/ecephys_666859_2023-06-1  44           
└─────────┴─────────────────────────────────┴─────────────────────────────────┴──────────────┘

2. Quickly provide a summary of the metadata for all NWB files in a project

>>> lazynwb.get_metadata_df(nwb_paths, as_polars=True)
```Getting metadata: 100%|█████████████████████| 252/252 [00:17<00:00, 14.51file/s]
shape: (252, 28)
┌────────────┬────────────┬───────────┬───────────┬───┬────────┬───────────┬───────────┬───────────┐
 identifier  session_st  session_i  session_d    weight  strain     date_of_b  _nwb_path 
 ---         art_time    d          escriptio     ---     ---        irth       ---       
 str         ---         ---        n             null    str        ---        str       
             datetime[μ  str        ---                              datetime[            
             s, UTC]                str                              μs, UTC]             
╞════════════╪════════════╪═══════════╪═══════════╪═══╪════════╪═══════════╪═══════════╪═══════════╡
 0514cf12-2  2024-08-07  713655_20  ecephys      null    Sst-IRES-  2023-11-2  /data/dyn 
 41f-4ab2-a  19:03:44    24-08-07   session               Cre;Ai32   3          amicrouti 
 ce9-1c2619  UTC                    (day 3)                          08:00:00   ng_datacu 
                                   with b                          UTC        be_      
 5c032dff-e  2024-12-06  743199_20  ecephys      null    VGAT-ChR2  2024-05-1  /data/dyn 
 04f-4884-9  19:06:17    24-12-06   session               -YFP       8          amicrouti 
 85d-055ac7  UTC                    (day 4)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 4a7e9fdb-4  2022-09-27  636397_20  ecephys      null    C57BL6J(N  2022-06-0  /data/dyn 
 fab-4052-a  18:36:50    22-09-27   session               P)         2          amicrouti 
 7fc-f2d109  UTC                    (day 2)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 9b4aab77-5  2025-01-16  744279_20  ecephys      null    Sst-IRES-  2024-05-2  /data/dyn 
 021-43f3-9  22:01:37    25-01-16   session               Cre;Ai32   5          amicrouti 
 f18-b13291  UTC                    (day 4)                          07:00:00   ng_datacu 
                                   with b                          UTC        be_      
 b0ba34cb-4  2024-04-22  706401_20  ecephys      null    Sst-IRES-  2023-10-0  /data/dyn 
...
 971-495d-b  19:18:59    25-03-18   session               -YFP       6          amicrouti 
 6ed-dc7b08  UTC                    (day 1)                          07:00:00   ng_datacu 
                                   withou                          UTC        be_      
└────────────┴────────────┴───────────┴───────────┴───┴────────┴───────────┴───────────┴───────────┘

3. Quickly provide a summary of the contents of a single NWB file

>>> lazynwb.get_internal_paths(nwb_paths[0])
{
  '/acquisition/frametimes_eye_camera/timestamps': <HDF5 dataset "timestamps": shape (267399,), type "<f8">,
  '/acquisition/frametimes_front_camera/timestamps': <HDF5 dataset "timestamps": shape (267204,), type "<f8">,
  '/acquisition/frametimes_side_camera/timestamps': <HDF5 dataset "timestamps": shape (267374,), type "<f8">,
  '/acquisition/lick_sensor_events/data': <HDF5 dataset "data": shape (2734,), type "<f8">,
  '/acquisition/lick_sensor_events/timestamps': <HDF5 dataset "timestamps": shape (2734,), type "<f8">,
  '/intervals/aud_rf_mapping_trials': <HDF5 group "/intervals/aud_rf_mapping_trials" (10 members)>,
  '/intervals/epochs': <HDF5 group "/intervals/epochs" (9 members)>,
  '/intervals/performance': <HDF5 group "/intervals/performance" (21 members)>,
  '/intervals/trials': <HDF5 group "/intervals/trials" (48 members)>,
  '/intervals/vis_rf_mapping_trials': <HDF5 group "/intervals/vis_rf_mapping_trials" (12 members)>,
  '/processing/behavior/dlc_eye_camera': <HDF5 group "/processing/behavior/dlc_eye_camera" (110 members)>,
  '/processing/behavior/eye_tracking': <HDF5 group "/processing/behavior/eye_tracking" (26 members)>,
  '/processing/behavior/facemap_front_camera/data': <HDF5 dataset "data": shape (267204, 500), type "<f4">,
  '/processing/behavior/facemap_front_camera/timestamps': <HDF5 dataset "timestamps": shape (267204,), type "<f8">,
  '/processing/behavior/facemap_side_camera/data': <HDF5 dataset "data": shape (267374, 500), type "<f4">,
  '/processing/behavior/facemap_side_camera/timestamps': <HDF5 dataset "timestamps": shape (267374,), type "<f8">,
  '/processing/behavior/licks/data': <HDF5 dataset "data": shape (2707,), type "<f8">,
  '/processing/behavior/licks/timestamps': <HDF5 dataset "timestamps": shape (2707,), type "<f8">,
  '/processing/behavior/lp_front_camera': <HDF5 group "/processing/behavior/lp_front_camera" (57 members)>,
  '/processing/behavior/lp_side_camera': <HDF5 group "/processing/behavior/lp_side_camera" (57 members)>,
  '/processing/behavior/quiescent_interval_violations/timestamps': <HDF5 dataset "timestamps": shape (131,), type "<f8">,
  '/processing/behavior/rewards/timestamps': <HDF5 dataset "timestamps": shape (130,), type "<f8">,
  '/processing/behavior/running_speed/data': <HDF5 dataset "data": shape (251998,), type "<f8">,
  '/processing/behavior/running_speed/timestamps': <HDF5 dataset "timestamps": shape (251998,), type "<f8">
 }

4. Get the common schema for a table in one or more NWB files

>>> lazynwb.get_table_schema(nwb_paths, table_path="/intervals/trials")
# uses polars (arrow) datatypes
OrderedDict([('condition', String), ('id', Int64), ('start_time', Float64), ('stop_time', Float64), ('_nwb_path', String), ('_table_path', String), ('_table_index', UInt32)])

Development

See instructions in https://github.com/bjhardcastle/lazynwb/CONTRIBUTING.md and the original template: https://github.com/bjhardcastle/copier-pdm-npc/blob/main/README.md

notes

  • hdf5 access seems to have a mutex lock that threads spend a long time waiting to acquire (with remfile)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazynwb-0.2.62.tar.gz (45.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lazynwb-0.2.62-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file lazynwb-0.2.62.tar.gz.

File metadata

  • Download URL: lazynwb-0.2.62.tar.gz
  • Upload date:
  • Size: 45.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.2

File hashes

Hashes for lazynwb-0.2.62.tar.gz
Algorithm Hash digest
SHA256 ced61b93b395354769559d31998602b634af03ffb6dcd18defdb21639ef7dc79
MD5 dec5d2d1968f9a762863b9be7b7b5852
BLAKE2b-256 7bd30a170c53efaf01c92d1baf877b2a5e777eebb1974e40ce49f19ec3ebe347

See more details on using hashes here.

File details

Details for the file lazynwb-0.2.62-py3-none-any.whl.

File metadata

  • Download URL: lazynwb-0.2.62-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.8.2

File hashes

Hashes for lazynwb-0.2.62-py3-none-any.whl
Algorithm Hash digest
SHA256 3b723fcb1af6a6f2760b0c051edec8a5ae4ad46df81e0b0ebc6817e41713b6b8
MD5 55236faf0848c1fa7bf6ec07e68c670b
BLAKE2b-256 d5cce8f2f9b997f681a8474266f13ce9c80cc5a1b8443a90651f3a4130a1cd16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page