Skip to main content

Python package to simplify access to the data available in NCI Imaging Data Commons

Project description

idc-index

Programmatic access to NCI Imaging Data Commons - the largest public collection of cancer imaging data

PyPI version PyPI platforms Actions Status Documentation Status Discourse Forum

What is Imaging Data Commons?

NCI Imaging Data Commons (IDC) is a cloud-based platform providing researchers with free access to a large and growing collection of cancer imaging data. This includes radiology images (CT, MRI, PET), digital pathology slides, and more - all in standard DICOM format with rich clinical and research metadata.

idc-index is the official Python package for querying IDC metadata and downloading imaging data - no cloud credentials or complex setup required.

Features

  • Query metadata with SQL - Search across ~100TB of data using DuckDB-powered SQL queries
  • High-speed downloads - Parallel downloads from AWS and Google Cloud public buckets via s5cmd
  • Browse hierarchically - Navigate collections → patients → studies → series programmatically
  • Generate viewer URLs - Create links to view images in OHIF (radiology) or Slim (pathology) web viewers
  • Command line interface - Download data directly from the terminal with idc commands
  • No authentication required - All data is publicly accessible

Installation

pip install idc-index

Requires Python 3.10+. Downloads are powered by the bundled s5cmd tool.

Keeping Up to Date

The package version is updated with each new IDC data release. Upgrade regularly to access the latest collections and data:

pip install --upgrade idc-index

Quick Start

Explore and Download a Collection

from idc_index import IDCClient

client = IDCClient.client()

# List all available collections
collections = client.get_collections()
print(f"IDC has {len(collections)} collections")

# Download a small collection (10.5 GB)
client.download_from_selection(collection_id="rider_pilot", downloadDir="./data")

Query with SQL

Find CT scans of the chest and download them:

from idc_index import IDCClient

client = IDCClient.client()

query = """
SELECT
    collection_id,
    PatientID,
    SeriesInstanceUID,
    SeriesDescription,
    series_size_MB
FROM index
WHERE Modality = 'CT'
  AND BodyPartExamined = 'CHEST'
LIMIT 10
"""

results = client.sql_query(query)
print(results)

# Download the matching series
client.download_dicom_series(
    seriesInstanceUID=results["SeriesInstanceUID"].tolist(), downloadDir="./chest_ct"
)

Browse Data Hierarchy and View Images

Navigate from collection to viewable images:

from idc_index import IDCClient

client = IDCClient.client()

# Get patients in a collection
patients = client.get_patients("tcga_luad", outputFormat="list")
print(f"Found {len(patients)} patients")

# Get studies for a patient
studies = client.get_dicom_studies(patients[0])

# Get series in that study
series = client.get_dicom_series(studies[0]["StudyInstanceUID"])

# Generate a viewer URL
viewer_url = client.get_viewer_URL(seriesInstanceUID=series[0]["SeriesInstanceUID"])
print(f"View in browser: {viewer_url}")

Command Line Interface

Download data directly from the terminal using idc download, which auto-detects the input type:

# Download a collection
idc download rider_pilot

# Download a specific series by UID
idc download 1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860

# Download from a manifest file
idc download manifest.s5cmd

# Specify output directory
idc download rider_pilot --download-dir ./data

# See all options
idc --help

Documentation

Resources

  • IDC Portal - Browse IDC data in your web browser
  • IDC Forum - Community discussions and support
  • idc-claude-skill - Claude AI skill for querying IDC with natural language
  • SlicerIDCBrowser - 3D Slicer extension using idc-index
  • s5cmd - The high-performance S3 client powering downloads

Citation

If idc-index helps your research, please cite:

Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180

Acknowledgment

This software is maintained by the IDC team, which has been funded in whole or in part with Federal funds from the NCI, NIH, under task order no. HHSN26110071 under contract no. HHSN261201500003I.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

idc_index-0.11.9.tar.gz (55.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

idc_index-0.11.9-py3-none-any.whl (29.8 kB view details)

Uploaded Python 3

File details

Details for the file idc_index-0.11.9.tar.gz.

File metadata

  • Download URL: idc_index-0.11.9.tar.gz
  • Upload date:
  • Size: 55.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for idc_index-0.11.9.tar.gz
Algorithm Hash digest
SHA256 f9c2e832d95742b157fcfe3e37bc3ac44a0f8988823e0d0361ac7b34c8a73480
MD5 91611b4159d185bdcf399fbfd8175f81
BLAKE2b-256 548485bd36448ccb18c654e5b6b1f945d684bcbf027716e695efab80e7445add

See more details on using hashes here.

Provenance

The following attestation bundles were made for idc_index-0.11.9.tar.gz:

Publisher: cd.yml on ImagingDataCommons/idc-index

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file idc_index-0.11.9-py3-none-any.whl.

File metadata

  • Download URL: idc_index-0.11.9-py3-none-any.whl
  • Upload date:
  • Size: 29.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for idc_index-0.11.9-py3-none-any.whl
Algorithm Hash digest
SHA256 97bb95428a61e1a048549832768bcb643e5bed088e7ebc6d790485e37daaf00c
MD5 b5243c04c47fe5cd5901d64a3d2f5766
BLAKE2b-256 513a2491113f793fb5861c29a76a3ea5e46032aed11795f456f31953de4e231e

See more details on using hashes here.

Provenance

The following attestation bundles were made for idc_index-0.11.9-py3-none-any.whl:

Publisher: cd.yml on ImagingDataCommons/idc-index

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page