Skip to main content

Convert IDML files to DocBook.

Project description

PyPI - Version GitHub Actions Workflow Status GitHub License

idml2docbook

This Python package converts IDML (InDesign Markup Language) files to Docbook 5.2.

More importantly, because DocBook is supported by Pandoc, this tool effectively enables IDML to be converted into dozens of other formats (Markdown, DOCX, EPUB, ODT, AsciiDoc, etc.). In practice, idml2docbook acts as a custom reader of IDML files for Pandoc. It is a bridge between InDesign and the Pandoc ecosystem.

flowchart LR
    subgraph S1["idml2docbook"]
        IDML[IDML] --> DOCBOOK[DocBook]
    end

    subgraph S2["Pandoc"]
        DOCBOOK --> MD[Markdown]
        DOCBOOK --> DOCX[DOCX]
        DOCBOOK --> ADOC[AsciiDoc]
        DOCBOOK --> ODT[ODT]
        DOCBOOK --> EPUB[EPUB]
        DOCBOOK --> ETC[etc.]
    end

Installation

First, create a virtual environment. Then, you can install and download this package using pip:

pip install idml2docbook

The package is now installed, but the environment still needs to be configured. This converter requires external dependencies because it is basically a wrapper around idml2xml-frontend that takes its Hub XML output and converts it to DocBook. To make it all work, the following is required:

  • Python >= 3.x
  • Java >= 1.7
  • bash >= 5.x (by default, on macOS, the installed version is 3.2 — a more recent version can be installed with brew)
  • git (needed to install idml2xml-frontend)
  • idml2xml-frontend

The following command helps you check if you have those dependencies installed. It also installs idml2xml-frontend and generates a sample .env if none are to be found in your folder:

idml2docbook-install-dependencies

If you already have a .env file in your project, you will need to manually add it the path to idml2xml-frontend:

IDML2HUBXML_SCRIPT_FOLDER="/path/to/idml2xml-frontend"

For large IDML files, it may be necessary to increase the Java heap size, for example to 2048m or 4096m.

Usage

Command-line

Convert an IDML file:

idml2docbook file.idml

Options are also available. They are as well documented in the command-line tool (see the help with -h/--help).

  • -x, --idml2hubxml-file
    Treats the input file as a Hub XML file.
    Useful for saving processing time if idml2xml-frontend has already been run on the source IDML file.

  • -o, --output <file>
    Name to assign to the output file.
    By default, output is sent to standard output (stdout).

  • -t, --typography
    Applies French typographic refinements.
    (thin spaces, non-breaking spaces, etc.).

  • -l, --thin-spaces
    Use only thin spaces for typography refinement.
    Should be used together with --typography.

  • -b, --linebreaks
    Do not replace <br> tags with spaces.

  • -f, --media <path>
    Path to the folder containing media files.
    Default: Links.

  • -r, --raster <extension>
    Extension to use when replacing that of raster images.
    Example: jpg.

  • -v, --vector <extension>
    Extension to use when replacing that of vector images.
    Example: svg.

  • -i, --idml2hubxml-output <path>
    Path to the output from Transpect’s idml2hubxml converter.
    Default: idml2hubxml.

  • -s, --idml2hubxml-script <path>
    Path to the script of Transpect’s idml2xml-frontend converter.

  • --version
    Displays the version of idml2docbook and exits the program.

In addition to idml2docbook, another command is also accessible through the CLI, idml2docbook-utils. This command takes a Hub XML file as input, as well as other options to extract styles data under various formats:

  • --to-css
    Generates a CSS file that contains the paragraph and character styles attribute/value pairs extracted from the original IDML input file.

  • --to-ods
    Generates an ODS file based on the paragraph and character styles of the original IDML input file.

IDML custom reader for Pandoc

Simple command to use this package with Pandoc:

pandoc -f docbook -t markdown <(idml2docbook input.idml)

Though, it is possible to do crazy stuff as well 🤪

pandoc -f docbook \
       -t markdown_phpextra \
       --wrap=none \
       -o output/output.md \
       <(idml2docbook input.idml \
                --typography \
                --thin-spaces \
                --raster jpg \
                --vector svg \
                --media images)

InDesign paragraph and character styles are converted into DocBook as role attributes. Pandoc supports role attributes in the Docbook reader in versions 3.9 (February 2026) and higher. In order to convert role attributes into Pandoc classes, the roles-to-classes.lua filter can be used:

pandoc -f docbook -t markdown --lua-filter=roles-to-classes.lua <(idml2docbook input.idml)

Python scripts

Sample script to use the API:

from idml2docbook.core import idml2docbook

file = "input.idml"

# Options are optional!
options = {
    'typography': True,
    'thin_spaces': True,
    'linebreaks': True,
    'ignore_overrides': True,
    'raster': "jpg",
    'vector': "svg",
    'media': "images"
}

output = idml2docbook(file, **options)
print(output)

It is also possible to output the paragraph and character styles as CSS by extracting them from the resulting Hub XML file:

from idml2docbook.idml2hubxml import idml2hubxml
from idml2docbook.map import generate_css

file = "input.xml"

hubxml = idml2hubxml(file, read_output_file=True)

output = generate_css(hubxml)
print(output)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

idml2docbook-1.2.0.tar.gz (35.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

idml2docbook-1.2.0-py3-none-any.whl (35.7 kB view details)

Uploaded Python 3

File details

Details for the file idml2docbook-1.2.0.tar.gz.

File metadata

  • Download URL: idml2docbook-1.2.0.tar.gz
  • Upload date:
  • Size: 35.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for idml2docbook-1.2.0.tar.gz
Algorithm Hash digest
SHA256 34edd67d12b630e32dc2ab983325b76a7d3a7924ab4399801daad5c3b1d1718e
MD5 9dc3ca282dc1fbf5ad4df7bc89636392
BLAKE2b-256 15b1f6ab85b58fa1edd7e7bd17c49c6c84ddc4ee56d4b7ee3eddeae02311f9b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for idml2docbook-1.2.0.tar.gz:

Publisher: publish.yml on yanntrividic/idml2docbook

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file idml2docbook-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: idml2docbook-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 35.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for idml2docbook-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d7eecea9e62e50d483c54d7cc44ba4d8af1fb95b9b30d62468512b4ef9cdb930
MD5 3418b7066a552fa4ca1477a2f657ebf1
BLAKE2b-256 6add71052bfc4673db875dcc1a5b18c49edff361344d0459f384f3e9a710cff3

See more details on using hashes here.

Provenance

The following attestation bundles were made for idml2docbook-1.2.0-py3-none-any.whl:

Publisher: publish.yml on yanntrividic/idml2docbook

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page