Python package to simplify access to the data available in NCI Imaging Data Commons
Project description
idc-index
Programmatic access to NCI Imaging Data Commons - the largest public collection of cancer imaging data
What is Imaging Data Commons?
NCI Imaging Data Commons (IDC) is a cloud-based platform providing researchers with free access to a large and growing collection of cancer imaging data. This includes radiology images (CT, MRI, PET), digital pathology slides, and more - all in standard DICOM format with rich clinical and research metadata.
idc-index is the official Python package for querying IDC metadata and
downloading imaging data - no cloud credentials or complex setup required.
Features
- Query metadata with SQL - Search across ~100TB of data using DuckDB-powered SQL queries
- High-speed downloads - Parallel downloads from AWS and Google Cloud public buckets via s5cmd
- Browse hierarchically - Navigate collections → patients → studies → series programmatically
- Generate viewer URLs - Create links to view images in OHIF (radiology) or Slim (pathology) web viewers
- Command line interface - Download data directly from the terminal with
idccommands - No authentication required - All data is publicly accessible
Installation
pip install idc-index
Requires Python 3.10+. Downloads are powered by the bundled s5cmd tool.
Keeping Up to Date
The package version is updated with each new IDC data release. Upgrade regularly to access the latest collections and data:
pip install --upgrade idc-index
Quick Start
Explore and Download a Collection
from idc_index import IDCClient
client = IDCClient.client()
# List all available collections
collections = client.get_collections()
print(f"IDC has {len(collections)} collections")
# Download a small collection (10.5 GB)
client.download_from_selection(collection_id="rider_pilot", downloadDir="./data")
Query with SQL
Find CT scans of the chest and download them:
from idc_index import IDCClient
client = IDCClient.client()
query = """
SELECT
collection_id,
PatientID,
SeriesInstanceUID,
SeriesDescription,
series_size_MB
FROM index
WHERE Modality = 'CT'
AND BodyPartExamined = 'CHEST'
LIMIT 10
"""
results = client.sql_query(query)
print(results)
# Download the matching series
client.download_dicom_series(
seriesInstanceUID=results["SeriesInstanceUID"].tolist(), downloadDir="./chest_ct"
)
Browse Data Hierarchy and View Images
Navigate from collection to viewable images:
from idc_index import IDCClient
client = IDCClient.client()
# Get patients in a collection
patients = client.get_patients("tcga_luad", outputFormat="list")
print(f"Found {len(patients)} patients")
# Get studies for a patient
studies = client.get_dicom_studies(patients[0])
# Get series in that study
series = client.get_dicom_series(studies[0]["StudyInstanceUID"])
# Generate a viewer URL
viewer_url = client.get_viewer_URL(seriesInstanceUID=series[0]["SeriesInstanceUID"])
print(f"View in browser: {viewer_url}")
Command Line Interface
Download data directly from the terminal using idc download, which
auto-detects the input type:
# Download a collection
idc download rider_pilot
# Download a specific series by UID
idc download 1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222365663678666836860
# Download from a manifest file
idc download manifest.s5cmd
# Specify output directory
idc download rider_pilot --download-dir ./data
# See all options
idc --help
Documentation
- Full Documentation - API reference and guides
- Tutorial Notebook - Interactive introduction to idc-index
Resources
- IDC Portal - Browse IDC data in your web browser
- IDC Forum - Community discussions and support
- idc-claude-skill - Claude AI skill for querying IDC with natural language
- SlicerIDCBrowser - 3D Slicer extension using idc-index
- s5cmd - The high-performance S3 client powering downloads
Citation
If idc-index helps your research, please cite:
Fedorov, A., Longabaugh, W. J. R., Pot, D., Clunie, D. A., Pieper, S. D., Gibbs, D. L., Bridge, C., Herrmann, M. D., Homeyer, A., Lewis, R., Aerts, H. J. W., Krishnaswamy, D., Thiriveedhi, V. K., Ciausu, C., Schacherer, D. P., Bontempi, D., Pihl, T., Wagner, U., Farahani, K., Kim, E. & Kikinis, R. National Cancer Institute Imaging Data Commons: Toward Transparency, Reproducibility, and Scalability in Imaging Artificial Intelligence. RadioGraphics (2023). https://doi.org/10.1148/rg.230180
Acknowledgment
This software is maintained by the IDC team, which has been funded in whole or in part with Federal funds from the NCI, NIH, under task order no. HHSN26110071 under contract no. HHSN261201500003I.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file idc_index-0.11.9.tar.gz.
File metadata
- Download URL: idc_index-0.11.9.tar.gz
- Upload date:
- Size: 55.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9c2e832d95742b157fcfe3e37bc3ac44a0f8988823e0d0361ac7b34c8a73480
|
|
| MD5 |
91611b4159d185bdcf399fbfd8175f81
|
|
| BLAKE2b-256 |
548485bd36448ccb18c654e5b6b1f945d684bcbf027716e695efab80e7445add
|
Provenance
The following attestation bundles were made for idc_index-0.11.9.tar.gz:
Publisher:
cd.yml on ImagingDataCommons/idc-index
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
idc_index-0.11.9.tar.gz -
Subject digest:
f9c2e832d95742b157fcfe3e37bc3ac44a0f8988823e0d0361ac7b34c8a73480 - Sigstore transparency entry: 938583567
- Sigstore integration time:
-
Permalink:
ImagingDataCommons/idc-index@cf50b263a3c2ba699e26de5b88f3801eb3f06fa3 -
Branch / Tag:
refs/tags/0.11.9 - Owner: https://github.com/ImagingDataCommons
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@cf50b263a3c2ba699e26de5b88f3801eb3f06fa3 -
Trigger Event:
release
-
Statement type:
File details
Details for the file idc_index-0.11.9-py3-none-any.whl.
File metadata
- Download URL: idc_index-0.11.9-py3-none-any.whl
- Upload date:
- Size: 29.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97bb95428a61e1a048549832768bcb643e5bed088e7ebc6d790485e37daaf00c
|
|
| MD5 |
b5243c04c47fe5cd5901d64a3d2f5766
|
|
| BLAKE2b-256 |
513a2491113f793fb5861c29a76a3ea5e46032aed11795f456f31953de4e231e
|
Provenance
The following attestation bundles were made for idc_index-0.11.9-py3-none-any.whl:
Publisher:
cd.yml on ImagingDataCommons/idc-index
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
idc_index-0.11.9-py3-none-any.whl -
Subject digest:
97bb95428a61e1a048549832768bcb643e5bed088e7ebc6d790485e37daaf00c - Sigstore transparency entry: 938583575
- Sigstore integration time:
-
Permalink:
ImagingDataCommons/idc-index@cf50b263a3c2ba699e26de5b88f3801eb3f06fa3 -
Branch / Tag:
refs/tags/0.11.9 - Owner: https://github.com/ImagingDataCommons
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@cf50b263a3c2ba699e26de5b88f3801eb3f06fa3 -
Trigger Event:
release
-
Statement type: