Package to query and download data from an index of ImagingDataCommons
Project description
idc-index
About
idc-index
is a Python package that enables query of the basic metadata and
download of DICOM files hosted by the
NCI Imaging Data Commons (IDC).
👷 🚧 This package is in its early development stages. Its functionality and API will change. Stay tuned for the updates and documentation, and please share your feedback about it by opening issues in this repository, or by starting a discussion in IDC User forum.🚧
Usage
There are no prerequisites - just install the package ...
$ pip install idc-index
... and download files corresponding to any collection, DICOM PatientID/Study/Series as follows:
from idc_index import index
client = index.IDCClient()
all_collection_ids = client.get_collections()
client.download_from_selection(collection_id="rider_pilot", downloadDir="/some/dir")
... or run queries against the "mini" index of Imaging Data Commons data!
from idc_index import index
client = index.IDCClient()
query = """
SELECT
collection_id,
STRING_AGG(DISTINCT(Modality)) as modalities,
STRING_AGG(DISTINCT(BodyPartExamined)) as body_parts
FROM
index
GROUP BY
collection_id
ORDER BY
collection_id ASC
"""
client.sql_query(query)
Details of the attributes included in the index are in the release notes.
Tutorial
This package was first presented at the 2023 Annual meeting of Radiological Society of North America (RSNA) Deep Learning Lab IDC session.
Please check out
this tutorial notebook
for the introduction into using idc-index
for navigating IDC data.
Resources
- Imaging Data Commons Portal can be used to explore the content of IDC from the web browser
- s5cmd is a highly efficient, open source, multi-platform S3 client that we use for downloading IDC data, which is hosted in public AWS and GCS buckets. Distributed on PyPI as s5cmd.
- SlicerIDCBrowser 3D
Slicer extension that relies on
idc-index
for search and download of IDC data
Acknowledgment
This software is maintained by the IDC team, which has been funded in whole or in part with Federal funds from the NCI, NIH, under task order no. HHSN26110071 under contract no. HHSN261201500003l.
If this package helped your research, we would appreciate if you could cite IDC paper below.
Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for idc_index-0.5.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40e2c0a0959c5d4747ca274c1a5ec2dd5057bd94730cead8e2994454d5baef5c |
|
MD5 | dd0c0faca71ab93c72ba5e7a8c65d8a2 |
|
BLAKE2b-256 | 160345b80a433733120c889cb7f7fa5405d995bb51c56d32dffa2ecc8d1b0257 |