Utilities for converting rich text markup files to XLIFF format
Project description
This package provides a set of utilities for converting rich-text markup files to OASIS XLIFF translatable format. Currently only HTML <=> XLIFF 1.2 conversion is supported.
Installation
Python XLIFF Converter can be installed using pip:
pip install py-xliff-converter
Usage
HTML => XLIFF
Command line:
html2xliff <myfile>.html
This command will create <myfile>.xlf file that can be translated using most common online and offline CAT tools: Trados, memoQ, Transifex etc.
API:
from xliff_converter.html_parser import convert_html
...
with open(html_filename, 'r', encoding='utf-8') as fo:
html = fo.read()
xliff = convert_html(html, html_filename)
...
The convert_html(...) function returns translatable XLIFF document as bytes string encoded in UTF-8.
XLIFF => HTML
Command line:
xliff2html <myfile>.xlf
This command will create <myfile>_<lang>.html file containing translated content of the source HTML file. <lang> is the language code of a target language.
API:
from xliff_converter.html_rebuilder import rebuild_html
...
with open(xliff_filename, 'r', encoding='utf-8') as fo:
xliff = fo.read()
filename, html = rebuild_html(xliff)
The rebuild_html(...) function returns a tuple (named tuple) containing the name of a translated HTML file and its contents as str.
Notes
Currently Python XLIFF Converter supports only English as a source language.
Translatable text is segmented by sentences using NLTK sentence tokenizer.
The HTML converter accepts partial HTML markup, e.g. <body> tag contents and even plain text.
<br> tags are treated as translation segment delimiters.
<pre><code>...</code></pre> blocks are ignored.
To do
More file formats.
XLIFF 2.0 support.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.