Convert Quark Xpress Tags to XML
Project description
Convert Quark Xpress tagged text (Tags, xtags) to XML.
This module is intended as a pre-processing step in the conversion of Quark Xpress tagged text to (semantic) HTML; as such it does not attempt to convert every single tag to XML, but only those that are relevant to the production of semantic, HTML5-compliant HTML.
This means that Paragraph and Character style sheet definitions are ignored; we only care about the apllied style sheet names. It also means that import character attributes like <i> and <b>, but ignore tags related to print typesseting only (tracking, kerning, baseline shift, etc.) by default , though they can be turned on.
The module doen’t actually produced XML but an Element Tree, in case you’d like to do further processing on the tree itself before serialising it with lxml.etree.tostring(). The serialised XML can then be turned to HTML with e.g. BeautifulSoup for postprocessing (for example, mapping Quark paragraph stylesheet to CSS classes, character tags and stylesheets to semantic HTML tags, rolling up indented quotes to <blockquote>, etc.)
Outputs UTF-8.
Usage:
>>> frpm quark_tagged_text import get_encoding, to_xml >>> from lxml.etree import to_tring >>> encoding = get_encoding(<source file>) >>> with open(<source file>, encoding=encoding) as tagged_text: >>> element_tree = to_xml(tagged_text) >>> serialised_xml = tostring(element_tree, encoding='utf-8')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for Quark_Xpress_Tags_to_xml-1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72c593f21ef08222737b1df3506b8ed92b4db3655c0eeedf08988089436846fd |
|
MD5 | 79c7464f20cf3bd1671ebfe02e761c55 |
|
BLAKE2b-256 | a4af95fc908bf5ff50a37d3ebdf927fac03401d330fba851620f5e545c028051 |