Skip to main content

XDXF to HTML conversion

Project description

Description

This is a Python module that converts XDXF formatted dictionary texts into HTML, written in modern C++.

It depends on nothing except for a C++11-compliant compiler. The parser does no error checking to minimise overhead.

Installation

pip install xdxf2html

Building

python3 setup.py build

Usage

>>> import xdxf2html
>>> xdxf2html.convert('''<k>Liverpool</k>
... <blockquote><dtrn>a large city and port in north-west England, on the River Mersey. It first became important during the <kref>Industrial Revolution</kref>, producing and exporting cotton goods. It was also a major port for the slave trade, receiving profits from the sale of slaves in America. In the 20th century the city became famous as the home of the <kref>Beatles</kref> and for Liverpool and Everton football clubs. Among its many famous buildings are the Royal Liver Building with its two towers, the Anglican and Roman Catholic cathedrals, and the <kref>Walker Art Gallery</kref>.</dtrn> <rref>portlpool.jpg</rref></blockquote>
... <blockquote>See also <kref>Mersey beat</kref>.</blockquote>''', 'test_dict')
'<h3 class="headword">Liverpool</h3><div class="xdxf-definition" style="margin-left: 0em;">a large city and port in north-west England, on the River Mersey. It first became important during the <a href="/api/lookup/test_dict/Industrial Revolution">Industrial Revolution</a>, producing and exporting cotton goods. It was also a major port for the slave trade, receiving profits from the sale of slaves in America. In the 20th century the city became famous as the home of the <a href="/api/lookup/test_dict/Beatles">Beatles</a>and for Liverpool and Everton football clubs. Among its many famous buildings are the Royal Liver Building with its two towers, the Anglican and Roman Catholic cathedrals, and the <a href="/api/lookup/test_dict/Walker Art Gallery">Walker Art Gallery</a>.<img src="/api/cache/test_dict/portlpool.jpg" alt="portlpool.jpg"/></div><div class="xdxf-definition" style="margin-left: 0em;">See also <a href="/api/lookup/test_dict/Mersey beat">Mersey beat</a>.</div>'

The module has only one method: convert, which takes two arguments: the XDXF text and the name of the dictionary. It returns the HTML text.

Appendix: a hopefully complete listing of XDXF tags, both standard and non-standard

This section will only include tags found within the dictionary 'body', i.e. <lexicon>.

Representational tags

  • <b>, <i>, <u>, <sub>, <sup>, <tt>, <br>: Could be directly translated into HTML.
  • <c>: Colour, indicated by the attribute c. Converted to a <span> with the style color: c. The default colour is darkgreen.

<div> like tags

  • <ar>: Article, ignored in this project.
  • <def>: Definition, converted to a <div> with the class xdxf-definition. If the attribute cmt is present, write it as a <span> with the class comment inside the <div>; if the attribute freq is present, write it as a <span> with the class frequency inside the <div>.
  • <ex>: Example, converted to a <div> with the class example and the style margin-left: 1em; color: grey;.
  • <co>: Comment, converted to a <div> with the class comment.
  • <sr>: 'Semantic relations', converted to a <div> with the class semantic-relations.
  • <etm>: Etymology, converted to a <div> with the style color: grey;.
  • <blockquote>: Not in the specification, but I've seen it. Converted to a <p>.

<span> like tags

  • <k>: Keyword, namely the headword, converted to an <h3> with the class headword.
  • <opt>: Optional part of the keyword, converted to a <span> with the class optional.
  • <deftext>: Definition text, ignored in this project.
  • <gr>: Grammatical information, converted to a <span> with the style font-style: italic; color: darkgreen;.
  • <pos>, <tense>: Ignored.
  • <tr>: Transcription, converted to a <span> with the class transcription.
  • <kref>: Keyword reference, converted to an <a> with the href attribute properly set (in this project, to /api/lookup/name of dictionary/keyword). If the attribute type is set, prepend the value of the attribute to the keyword with a colon and a space.
  • <iref>: External reference, could be directly converted to an <a>.
  • <dtrn>: Definition translation, in practice used as an equivalent of <def>. So we'd better ignore it.
  • <abbr>, <abr>: Abbreviation, converted to a <span> with the class abbreviation and the style color: darkgreen; font-style: italic;.
  • <ex_orig>, <ex_trn>: Example original and translation, converted to a <span> with the class example-original and example-translation respectively.
  • <exm>, <prv>, <oth>: Used inside <ex>, ignored in this project.
  • <mrkd>: Marked text, converted to a <span> with the style background-color: yellow;.
  • <nu>: Not-used, not explained in the specification. Ignored in this project.
  • <di>: 'Don't index', ignored in this project.
  • <syn>, <ant>, <hpr>, <hpn>, <par>, <spv>, <mer>, <hol>, <ent>, <rel>, <phr>: Phrasemes (don't know really what they are). <syn> and <ant> are converted to <span> with the class synonym and antonym respectively, and the tag name is prepended to the text content with : . The rest are ignored.
  • <categ>: Category, ignored in this project.

Media files

Always in an <rref> tag. The standard prescribes that the filename should be specified in the lctn attribute, but in practice, the filename is just the text child of the tag. Converted to <img>, <audio>, <video> or <a> depending on the file extension.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xdxf2html-0.1.0.tar.gz (21.7 kB view hashes)

Uploaded Source

Built Distributions

xdxf2html-0.1.0-cp312-cp312-win_amd64.whl (67.7 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

xdxf2html-0.1.0-cp312-cp312-win32.whl (58.6 kB view hashes)

Uploaded CPython 3.12 Windows x86

xdxf2html-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

xdxf2html-0.1.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

xdxf2html-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (74.6 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

xdxf2html-0.1.0-cp312-cp312-macosx_10_9_x86_64.whl (78.4 kB view hashes)

Uploaded CPython 3.12 macOS 10.9+ x86-64

xdxf2html-0.1.0-cp311-cp311-win_amd64.whl (67.7 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

xdxf2html-0.1.0-cp311-cp311-win32.whl (58.6 kB view hashes)

Uploaded CPython 3.11 Windows x86

xdxf2html-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

xdxf2html-0.1.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

xdxf2html-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (74.6 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

xdxf2html-0.1.0-cp311-cp311-macosx_10_9_x86_64.whl (78.4 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

xdxf2html-0.1.0-cp310-cp310-win_amd64.whl (67.1 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

xdxf2html-0.1.0-cp310-cp310-win32.whl (58.6 kB view hashes)

Uploaded CPython 3.10 Windows x86

xdxf2html-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

xdxf2html-0.1.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

xdxf2html-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (74.6 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

xdxf2html-0.1.0-cp310-cp310-macosx_10_9_x86_64.whl (78.4 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

xdxf2html-0.1.0-cp39-cp39-win_amd64.whl (67.7 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

xdxf2html-0.1.0-cp39-cp39-win32.whl (58.6 kB view hashes)

Uploaded CPython 3.9 Windows x86

xdxf2html-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

xdxf2html-0.1.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

xdxf2html-0.1.0-cp39-cp39-macosx_11_0_arm64.whl (74.6 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

xdxf2html-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl (78.3 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

xdxf2html-0.1.0-cp38-cp38-win_amd64.whl (67.7 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

xdxf2html-0.1.0-cp38-cp38-win32.whl (58.6 kB view hashes)

Uploaded CPython 3.8 Windows x86

xdxf2html-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

xdxf2html-0.1.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

xdxf2html-0.1.0-cp38-cp38-macosx_11_0_arm64.whl (74.6 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

xdxf2html-0.1.0-cp38-cp38-macosx_10_9_x86_64.whl (78.3 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

xdxf2html-0.1.0-cp37-cp37m-win_amd64.whl (67.7 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

xdxf2html-0.1.0-cp37-cp37m-win32.whl (58.6 kB view hashes)

Uploaded CPython 3.7m Windows x86

xdxf2html-0.1.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

xdxf2html-0.1.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

xdxf2html-0.1.0-cp37-cp37m-macosx_10_9_x86_64.whl (78.3 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page