pandadoc: lightweight pandoc wrapper
Project description
pandadoc: lightweight pandoc wrapper
An extremely lightweight pandoc wrapper for Python 3.8+.
Its features:
Supports conversion between all formats that pandoc supports - markdown, HTML, LaTeX, Word, epub, pdf (output), and more.
Output to raw bytes (binary formats - e.g. PDF), to str objects (text formats - e.g. markdown), or to file (any format).
pandoc errors are raised as (informative) exceptions.
Full flexibility of the pandoc command-line tool, and the same syntax. (See the pandoc manual for more information.)
Getting Started Guide
Installation
First, ensure pandoc is on your PATH. (In other words, install pandoc and add it to your PATH.)
Then install pandadoc from PyPI:
$ python -m pip install pandadoc
That’s it.
Usage
Convert a webpage to markdown, and store it as a python str:
>>> import pandadoc
>>> input_url = "https://example.com/"
>>> example_md = pandadoc.call_pandoc(
... options=["-t", "markdown"], files=[input_url]
... )
>>> print(example_md)
<div>
# Example Domain
This domain is for use in illustrative examples in documents.
...
Now convert the markdown to RTF, and write it to a file:
>>> rtf_output_file = "example.rtf"
>>> pandadoc.call_pandoc(
... options=["-f", "markdown", "-t", "rtf", "-o", rtf_output_file],
... input_text=example_md,
... )
''
Notice that call_pandoc returns an empty string '' when a file output is used. Looking at the output file:
{\pard \ql \f0 \sa180 \li0 \fi0 \outlinelevel0 \b \fs36 Example Domain\par} {\pard \ql \f0 \sa180 \li0 \fi0 This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\par} {\pard \ql \f0 \sa180 \li0 \fi0 {\field{\*\fldinst{HYPERLINK "https://www.iana.org/domains/example"}}{\fldrslt{\ul More information... }}} \par}
Convert this RTF document to PDF, using xelatex with a custom character set, and store the result as raw bytes:
>>> raw_pdf = pandadoc.call_pandoc(
... options=["-f", "markdown", "-t", "pdf", "--pdf-engine", "xelatex", "--variable-mainfont", "Palatino"],
... files=[rtf_output_file],
... decode=False,
... )
Note that PDF conversion requires a “PDF engine” (e.g. pdflatex, latexmk etc.) to be installed.
Now you can send those raw bytes over a network, or write them to a file:
>>> with open("example.pdf", "wb") as f:
... f.write(raw_pdf)
...
>>> # Finished
You can find more pandoc examples here.
Exceptions
If pandoc exits with an error, an appropriate exception is raised (based on the exit code):
>>> pandadoc.call_pandoc(
... options=["-f", "markdown", "-t", "zzz"], # non-existent format
... input_text=example_md,
... )
Traceback (most recent call last):
...
pandadoc.exceptions.PandocUnknownWriterError: Unknown output format zzz
>>> isinstance(pandadoc.exceptions.PandocUnknownWriterError(), pandadoc.PandocError)
True
You can find a full list of exceptions in the pandadoc.exceptions module.
Explanation
The pandoc command-line tool works like this:
pandoc [OPTIONS] [FILES]
In addition to the OPTIONS (documented here), you can provide either some FILES, or some input text (via stdin).
The call_pandoc function of pandadoc works in a similar way:
The options argument contains a list of pandoc options. E.g. ["-f", "markdown", "-t", "html"].
The files argument is a list of file paths (or absolute URIs). E.g. ["path/to/file.md", "https://www.fsf.org"]
The input_text argument is used as text input to pandoc. E.g. # Simple Doc\n\nA simple markdown document\n.
The timeout and decode arguments are used to control whether the pandoc process times out, and whether the result should be decoded to a str (True by default).
Bugs/Requests
Please use the GitHub issue tracker to submit bugs or request features.
Feedback is always appreciated.
License
Distributed under the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pandadoc-0.1.0.tar.gz
.
File metadata
- Download URL: pandadoc-0.1.0.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c81cb3d24ad0d123c1e10b63367a692e7a08311f8ce0a0fa89267d2a9e63d635 |
|
MD5 | 99093b292f95d659ff24cc1ba0a04c9b |
|
BLAKE2b-256 | f9f42db90d9f2718270db03d612689127981cb0165e76ab4802b44c6d8e70374 |
File details
Details for the file pandadoc-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pandadoc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f51083eeb6ce7ee858084f8d98bece2254b295023a380e3c9e5ef1f670429e5 |
|
MD5 | cf7fd2d0bd2c0f5ef2282d07e08ace15 |
|
BLAKE2b-256 | e57f733697ddc093b3185efd6ed3e440a2bc911b7fb13ab3a85522633e80f363 |