Skip to main content

A PDF to Dicom Converter

Project description

pdf2dcm

PyPI version Supported Python versionsDownloads Downloads License: MITcodecovTest PipelineRelease Pipeline

PDF to DICOM Converter

Convert PDFs into standards-compliant DICOM files for PACS, radiology, and healthcare interoperability workflows.

Features

  • Convert PDFs to Encapsulated DICOM or RGB Secondary Capture DICOM
  • Preserve patient/study metadata from template DICOMs
  • Simple Python API built on pydicom
  • Compatible with PACS workflows

SETUP

Python Package Setup

The python package is available for use on PyPI. It can be setup simply via pip

pip install pdf2dcm

To the check the setup, simply check the version number of the pdf2dcm package by

python -c 'import pdf2dcm; print(pdf2dcm.__version__)'

Poppler Setup

Poppler is a popular project that is used for the creation of Dicom RGB Secondary Capture. You can check if you already have it installed by calling pdftoppm -h in your terminal/cmd. To install poppler these are some of the recommended ways-

Conda

conda install -c conda-forge poppler

Ubuntu

sudo apt-get install poppler-utils

MacOS

brew install poppler

PDF to Encapsulated DCM

Stores the original PDF directly inside a DICOM object. This is useful for:

  • Radiology or pathology or any structured clinical documents
  • PACS archival workflows

Usage

from pdf2dcm import Pdf2EncapsDCM

converter = Pdf2EncapsDCM()
converted_dcm = converter.run(path_pdf='tests/test_data/test_file.pdf', path_template_dcm='tests/test_data/CT_small.dcm', suffix =".dcm")
print(converted_dcm)
# [ 'tests/test_data/test_file.dcm' ]

Parameters converter.run:

  • path_pdf (str): path of the pdf that needs to be encapsulated
  • path_template_dcm (str, optional): Optional template DICOM used for metadata inheritance.
  • suffix (str, optional): suffix of the dicom files. Defaults to ".dcm".

Returns:

  • List[Path]: list of path of the stored encapsulated dcm

PDF to RGB Secondary Capture DCM

Renders PDF pages as RGB images and stores them as Secondary Capture DICOM instances. Useful when:

  • Encapsulated PDFs are unsupported
  • Image-based viewing is preferred
  • Legacy PACS compatibility is required

Usage

from pdf2dcm import Pdf2RgbSC

converter = Pdf2RgbSC()
converted_dcm = converter.run(path_pdf='tests/test_data/test_file.pdf', path_template_dcm='tests/test_data/CT_small.dcm', suffix =".dcm")
print(converted_dcm)
# [ 'tests/test_data/test_file_0.dcm', 'tests/test_data/test_file_1.dcm' ]

Parameters converter.run:

  • path_pdf (str): path of the pdf that needs to be converted
  • path_template_dcm (str, optional): Optional template DICOM used for metadata inheritance.
  • suffix (str, optional): suffix of the dicom files. Defaults to ".dcm".

Returns:

  • List[Path]: list of paths of the stored secondary capture dcm

Notes

  • Output DICOM filenames are derived from the input PDF filename.
  • If no template is provided no repersonalisation takes place
  • It is possible to produce dicoms without a suffix by simply passing suffix="" to the converter.run()

Metadata Inheritance

Metadata can optionally be copied from a template DICOM file to preserve patient and study context. Currently, the fields that is inherited by default are-

  • PatientName
  • PatientID
  • PatientSex
  • StudyInstanceUID
  • SeriesInstanceUID
  • SOPInstanceUID

The fields SeriesInstanceUID and SOPInstanceUID have been removed from the inheritance by copying as it violates the DICOM standards.

You can set the fields to repersonalize by passing repersonalisation_fields into Pdf2EncapsDCM(), or Pdf2RgbSC()

Example:

fields = [
    "PatientName",
    "PatientID",
    "PatientSex",
    "StudyInstanceUID",
    "AccessionNumber"
]
converter = Pdf2RgbSC(repersonalisation_fields=fields)

note: this will overwrite the default fields.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2dcm-0.6.0.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2dcm-0.6.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file pdf2dcm-0.6.0.tar.gz.

File metadata

  • Download URL: pdf2dcm-0.6.0.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pdf2dcm-0.6.0.tar.gz
Algorithm Hash digest
SHA256 39c2c3350159888404c8249fb5c8bd902e56a1ed029d10eb9f586015cc5e4242
MD5 66931323ecc2b0e8fc6dcc7befc3fa87
BLAKE2b-256 74ef203269f486b428e0a6cb1920b4d40a8b6006865da9162ffced2f55d2927d

See more details on using hashes here.

File details

Details for the file pdf2dcm-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: pdf2dcm-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pdf2dcm-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff27aefb2e26acc24ab1e01dab2a4705fccff7c4728bd08dccdced27f1d0a7e6
MD5 05e348b27e7bdc371056b291926e8a0e
BLAKE2b-256 e9fdc91b34a36910ca162236c1b07bda59068a21a01749477de0508158f19023

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page