Skip to main content

Package to query and download data from an index of ImagingDataCommons

Project description

idc-index

Actions Status Documentation Status

PyPI version PyPI platforms

Discourse Forum

[!WARNING]

This package is in its early development stages. Its functionality and API will change.

Stay tuned for the updates and documentation, and please share your feedback about it by opening issues in this repository, or by starting a discussion in IDC User forum.

About

idc-index is a Python package that enables basic operations for working with NCI Imaging Data Commons (IDC):

  • subsetting of the IDC data using selected metadata attributes
  • download of the files corresponding to selection
  • generation of the viewer URLs for the selected data

Getting started

Install the latest version of the package.

$ pip install --upgrade idc-index

Instantiate IDCClient, which provides the interface for main operations.

from idc_index import IDCClient

client = IDCClient.client()

You can use IDC Portal to browse collections, cases, studies and series, copy their identifiers and download the corresponding files using idc-index helper functions.

You can try this out with the rider_pilot collection, which is just 10.5 GB in size:

client.download_from_selection(collection_id="rider_pilot", downloadDir=".")

... or run queries against the "mini" index of Imaging Data Commons data, and download images that match your selection criteria! The following will select all Magnetic Resonance (MR) series, and will download the first 10.

from idc_index import index

client = index.IDCClient()

query = """
SELECT
  SeriesInstanceUID
FROM
  index
WHERE
  Modality = 'MR'
"""

selection_df = client.sql_query(query)

client.download_from_selection(
    seriesInstanceUID=list(selection_df["SeriesInstanceUID"].values[:10]),
    downloadDir=".",
)

The indices of idc-index

idc-index is named this way because it wraps indices of IDC data: tables containing the most important metadata attributes describing the files available in IDC. The main metadata index is available in the index variable (which is a pandas DataFrame) of IDCClient. Additional index tables such as the clinical_index contain non-DICOM clinical data or slide microscopy specific tables (indicated by the prefix sm) include metadata attributes specific to slide microscopy images. A description of available attributes for all indices can be found here.

Tutorial

Please check out this tutorial notebook for the introduction into using idc-index.

Resources

  • Imaging Data Commons Portal can be used to explore the content of IDC from the web browser
  • s5cmd is a highly efficient, open source, multi-platform S3 client that we use for downloading IDC data, which is hosted in public AWS and GCS buckets. Distributed on PyPI as s5cmd.
  • SlicerIDCBrowser 3D Slicer extension that relies on idc-index for search and download of IDC data

Acknowledgment

This software is maintained by the IDC team, which has been funded in whole or in part with Federal funds from the NCI, NIH, under task order no. HHSN26110071 under contract no. HHSN261201500003l.

If this package helped your research, we would appreciate if you could cite IDC paper below.

Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

idc_index-0.7.5.tar.gz (40.1 kB view details)

Uploaded Source

Built Distribution

idc_index-0.7.5-py3-none-any.whl (25.9 kB view details)

Uploaded Python 3

File details

Details for the file idc_index-0.7.5.tar.gz.

File metadata

  • Download URL: idc_index-0.7.5.tar.gz
  • Upload date:
  • Size: 40.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for idc_index-0.7.5.tar.gz
Algorithm Hash digest
SHA256 06fc5b129b05fd16a0d6d5008a3cad25be0d96d5bb8ddd14d7dfa302a24fe998
MD5 f68f1ec60e234f7896fdfc20627b097b
BLAKE2b-256 c6c0f5612a7c62530d1edaf9521cfe1c52aef6de4b408141ed5b18f68694ff24

See more details on using hashes here.

Provenance

The following attestation bundles were made for idc_index-0.7.5.tar.gz:

Publisher: cd.yml on ImagingDataCommons/idc-index

Attestations:

File details

Details for the file idc_index-0.7.5-py3-none-any.whl.

File metadata

  • Download URL: idc_index-0.7.5-py3-none-any.whl
  • Upload date:
  • Size: 25.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for idc_index-0.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 51203eedde0e662c5d410fb80929d25bc152fd2e92b9de8e452206245b15a43b
MD5 169feb7ade8d5b68d99e527e85891a2b
BLAKE2b-256 c3b06abd834a95ebd4468d934679a7db70c227ca928d123d691106175f5d0bff

See more details on using hashes here.

Provenance

The following attestation bundles were made for idc_index-0.7.5-py3-none-any.whl:

Publisher: cd.yml on ImagingDataCommons/idc-index

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page