Skip to main content

Package to query and download data from an index of ImagingDataCommons

Project description

About

idc-index is a Python package that enables query of the basic metadata and download of DICOM files hosted by the NCI Imaging Data Commons (IDC).

👷‍♂️🚧 WARNING: this package is in its early development stages. Its functionality and API will change. Stay tuned for the updates and documentation, and please share your feedback about it by opening issues in this repository, or by starting a discussion in IDC User forum.🚧

Usage

There are no prerequisites - just install the package ...

$ pip install idc-index

... and download files corresponding to any collection, DICOM PatientID/Study/Series as follows:

from idc_index import index

client = index.IDCClient()

client.download_from_selection(collection_id = 'rider_pilot', downloadDir = '/some/dir')

... or run queries against the "mini" index of Imaging Data Commons data!

from idc_index import index

client = index.IDCClient()

query = """
SELECT
  collection_id,
  STRING_AGG(DISTINCT(Modality)) as modalities,
  STRING_AGG(DISTINCT(BodyPartExamined)) as body_parts
FROM
  index
GROUP BY
  collection_id
ORDER BY
  collection_id ASC
"""

client.sql_query(query)

Details of the attributes included in the index are in the release notes.

Tutorial

This package was first presented at the 2023 Annual meeting of Radiological Society of North America (RSNA) Deep Learning Lab IDC session.

Please check out this tutorial notebook for the introduction into using idc-index for navigating IDC data.

Resources

  • Imaging Data Commons Portal can be used to explore the content of IDC from the web browser
  • s5cmd is a highly efficient, open source, multi-platform S3 client that we use for downloading IDC data, which is hosted in public AWS and GCS buckets
  • SlicerIDCBrowser 3D Slicer extension that relies on idc-index for search and download of IDC data

Acknowledgment

This software is maintained by the IDC team, which has been funded in whole or in part with Federal funds from the NCI, NIH, under task order no. HHSN26110071 under contract no. HHSN261201500003l.

If this package helped your research, we would appreciate if you could cite IDC paper below.

Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

idc_index-0.2.11.tar.gz (11.8 kB view details)

Uploaded Source

File details

Details for the file idc_index-0.2.11.tar.gz.

File metadata

  • Download URL: idc_index-0.2.11.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for idc_index-0.2.11.tar.gz
Algorithm Hash digest
SHA256 391d79075309ba4d06aa5b4012061dba79119320006e70d9d3091ec8bc9e0b0b
MD5 353b5d4f1b7a0c93b271aa28b96fb5d4
BLAKE2b-256 bf6af3b5efad5ddb357e3edf4be062ff2d35eb6f2f62c483322042283894a4b2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page