Skip to main content

Index BIDS datasets fast, locally or in the cloud.

Project description

bids2table

CI Docs codecov Ruff Python3 License

Index BIDS datasets fast, locally or in the cloud.

Installation

To install the latest release from pypi, you can run

pip install bids2table

To install with S3 support, include the s3 extra

pip install bids2table[s3]

The latest development version can be installed with

pip install "bids2table[s3] @ git+https://github.com/childmindresearch/bids2table.git"

Usage

To run these examples, you will need to clone the bids-examples repo.

git clone -b 1.9.0 https://github.com/bids-standard/bids-examples.git

Finding BIDS datasets

You can search a directory for valid BIDS datasets using b2t2 find

(bids2table) clane$ b2t2 find bids-examples | head -n 10
bids-examples/asl002
bids-examples/ds002
bids-examples/ds005
bids-examples/asl005
bids-examples/ds051
bids-examples/eeg_rishikesh
bids-examples/asl004
bids-examples/asl003
bids-examples/ds003
bids-examples/eeg_cbm

Indexing datasets from the command line

Indexing datasets is done with b2t2 index. Here we index a single example dataset, saving the output as a parquet file.

(bids2table) clane$ b2t2 index -o ds102.parquet bids-examples/ds102
ds102: 100%|███████████████████████████████████████| 26/26 [00:00<00:00, 154.12it/s, sub=26, N=130]

You can also index a list of datasets. Note that each iteration in the progress bar represents one dataset.

(bids2table) clane$ b2t2 index -o bids-examples.parquet bids-examples/*
100%|████████████████████████████████████████████| 87/87 [00:00<00:00, 113.59it/s, ds=None, N=9727]

You can pipe the output of b2t2 find to b2t2 index to create an index of all datasets under a root directory.

(bids2table) clane$ b2t2 find bids-examples | b2t2 index -o bids-examples.parquet
97it [00:01, 96.05it/s, ds=ieeg_filtered_speech, N=10K]

The resulting index will include both top-level datasets (as in the previous command) as well nested derivatives datasets.

Indexing datasets hosted on S3

bids2table supports indexing datasets hosted on S3 via cloudpathlib. To use this functionality, make sure to install bids2table with the s3 extra. Or you can also just install cloudpathlib directly

pip install cloudpathlib[s3]

As an example, here we index all datasets on OpenNeuro

(bids2table) clane$ b2t2 index -o openneuro.parquet \
  -j 8 --use-threads s3://openneuro.org/ds*
100%|█████████████████████████████████████| 1408/1408 [12:25<00:00,  1.89it/s, ds=ds006193, N=1.2M]

Using 8 threads, we can index all ~1400 OpenNeuro datasets (1.2M files) in less than 15 minutes.

Indexing datasets from python

You can also index datasets using the Python API.

import bids2table as b2t2
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

# Index a single dataset.
tab = b2t2.index_dataset("bids-examples/ds102")

# Find and index a batch of datasets.
tabs = b2t2.batch_index_dataset(
    b2t2.find_bids_datasets("bids-examples"),
)
tab = pa.concat_tables(tabs)

# Index a dataset on S3.
tab = b2t2.index_dataset("s3://openneuro.org/ds000224")

# Save as parquet.
pq.write_table(tab, "ds000224.parquet")

# Convert to a pandas dataframe.
df = tab.to_pandas(types_mapper=pd.ArrowDtype)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bids2table-2.1.2.tar.gz (99.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bids2table-2.1.2-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file bids2table-2.1.2.tar.gz.

File metadata

  • Download URL: bids2table-2.1.2.tar.gz
  • Upload date:
  • Size: 99.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bids2table-2.1.2.tar.gz
Algorithm Hash digest
SHA256 26036a09c1066fd15f108032e4199cdfa72ab43108e4c0313305b827a9760025
MD5 909edc43eaede6b08d6d0804f4f40ccd
BLAKE2b-256 500b4417fa8d567a299c36bdb80c86372f747aa43373df26f86c29dda3011815

See more details on using hashes here.

Provenance

The following attestation bundles were made for bids2table-2.1.2.tar.gz:

Publisher: release.yaml on childmindresearch/bids2table

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bids2table-2.1.2-py3-none-any.whl.

File metadata

  • Download URL: bids2table-2.1.2-py3-none-any.whl
  • Upload date:
  • Size: 19.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bids2table-2.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a8f541c668dd5d6880a1ce65ff7610ca39946e687cf50c57492a7fa14320366a
MD5 c894705aaf2160929a629738eeb30f9d
BLAKE2b-256 799557e15bbd99041bd42d1c64463fe6a2bebd6f6888f4ab85c0894f084c9cec

See more details on using hashes here.

Provenance

The following attestation bundles were made for bids2table-2.1.2-py3-none-any.whl:

Publisher: release.yaml on childmindresearch/bids2table

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page