Python text markup and conversion
Project description
pyth3 - Python text markup and conversion
Pyth is intended to make it easy to convert marked-up text between different common formats. This is a (rather incomplete so far) port of pyth 0.6.0 to Python 3.
Marked-up text means text which has:
- Paragraphs
- Headings
- Bold, italic, and underlined text
- Hyperlinks
- Bullet lists
- Simple tables
- Very little else
Formats that have (very varying) degrees of support are
- Plain text
- XHTML
- RTF (Rich Text Format)
- PDF (output only)
Design principles/goals
- Ignore unsupported information in input formats (e.g. page layout)
- Ignore font issues -- output in a single font.
- Ignore specific text sizes, but maintain italics, boldface, subscript/superscript
- Have no dependencies unless they are written in Python, and work
- Make it easy to add support for new formats, by using an architecture based on plugins and adapters.
Examples
See directory examples
.
Python 3 migration
The code was originally written for Python 2. It has been partially(!) upgraded to Python 3 compatibility (starting via 'modernize'). This does not mean it will actually work!
pyth.plugins.rtf15.reader has been debugged and now appears to work correctly. pyth.plugins.xhtml.writer has been debugged and now appears to work correctly. pyth.plugins.plaintext.writer has been debugged and now appears to work correctly. Everything else is unknown (or definitely broken on Python 3: even many of the tests fail) See directory py3migration for a bit more detail. (If you find something is broken on Python 2 that worked before, please either fix it or simply stick to pyth version 0.6.0.)
Limitations
pyth.plugins.rtf15.reader:
- bulleted or enumerated items will be returned as plain paragraphs (no indentation, no bullets).
- cannot cope with Symbol font correctly:
- from MS Word: lower-coderange characters (greek mostly) work
- from MS Word: higher-coderange characters are missing, because Word encodes them in a horribly complicated manner not supported by pyth currently
- from Wordpad: lower- and higher-coderange characters come out in the wrong encoding (ANSI, I think)
pyth.plugins.xhtml.writer:
- very limited functionality
pyth.plugins.plaintext.writer:
- very very limited functionality
Others:
- will not work on Python 3 without some porting love-and-care
Tests
Don't try to run them all, it's frustrating.
py.test -v test_readrtf15.py
is a good way to run the least frustrating
subset of them.
It is normal that most others will fail on Python 3.
test_readrtf15.py
generates test cases dynamically based on
existing input files in tests/rtfs
and
existing reference output files in tests/rtf-as-html
and tests/rtf-as-html
.
The empty or missing output files indicate where functionality is missing,
which nicely indicates possible places to jump in if you want to help.
Dependencies
Only the most important two of the dependencies,
are actually declared in setup.py
, because the others are large, yet
are required only in pyth components not yet ported to Python 3.
They are:
reportlab
for PDFWriterdocutils
for LatexWriter
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyth3-0.7.tar.gz
.
File metadata
- Download URL: pyth3-0.7.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ba128299dae9018e7672110ec0082d68eaad89358054e33a49c70681d565f05 |
|
MD5 | d5d91dfa755e8f84f63f15aa94b3e1e5 |
|
BLAKE2b-256 | f77198eef62a0f69b3926811ee90e72c1df90147f3da455f7a9d24571e7c84f8 |
File details
Details for the file pyth3-0.7-py3-none-any.whl
.
File metadata
- Download URL: pyth3-0.7-py3-none-any.whl
- Upload date:
- Size: 29.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e9105c5acbfc3fd5b3e1aaad9ab6ebf715b89b5285c409a46b6a704a544a4b0 |
|
MD5 | e2c3494612a1d6e1853ad42bfc2be72c |
|
BLAKE2b-256 | 1c334578daf6629eae25e7bb0b42f916a8253636e4bbe179dd5f25b0207c3bad |