Utilities for converting rich text markup files to XLIFF format
This package provides a set of utilities for converting rich-text markup files to OASIS XLIFF translatable format. Currently only HTML <=> XLIFF 1.2 conversion is supported.
Python XLIFF Converter can be installed using pip:
pip install py-xliff-converter
HTML => XLIFF
This command will create <myfile>.xlf file that can be translated using most common online and offline CAT tools: Trados, memoQ, Transifex etc.
from xliff_converter.html_parser import convert_html ... with open(html_filename, 'r', encoding='utf-8') as fo: html = fo.read() xliff = convert_html(html, html_filename) ...
The convert_html(...) function returns translatable XLIFF document as bytes string encoded in UTF-8.
XLIFF => HTML
This command will create <myfile>_<lang>.html file containing translated content of the source HTML file. <lang> is the language code of a target language.
from xliff_converter.html_rebuilder import rebuild_html ... with open(xliff_filename, 'r', encoding='utf-8') as fo: xliff = fo.read() filename, html = rebuild_html(xliff)
The rebuild_html(...) function returns a tuple (named tuple) containing the name of a translated HTML file and its contents as str.
- Currently Python XLIFF Converter supports only English as a source language.
- Translatable text is segmented by sentences using NLTK sentence tokenizer.
- The HTML converter accepts partial HTML markup, e.g. <body> tag contents and even plain text.
- <br> tags are treated as translation segment delimiters.
- <pre><code>...</code></pre> blocks are ignored.
- More file formats.
- XLIFF 2.0 support.