Skip to main content

pandadoc: lightweight pandoc wrapper

Project description

pandadoc

pandadoc: lightweight pandoc wrapper

Project Version on PyPI Supported Python Versions Unit Tests Unit Test Coverage Code Style: Black MIT License

An extremely lightweight pandoc wrapper for Python 3.8+.

Its features:

  • Supports conversion between all formats that pandoc supports - markdown, HTML, LaTeX, Word, epub, pdf (output), and more.

  • Output to raw bytes (binary formats - e.g. PDF), to str objects (text formats - e.g. markdown), or to file (any format).

  • pandoc errors are raised as (informative) exceptions.

  • Full flexibility of the pandoc command-line tool, and the same syntax. (See the pandoc manual for more information.)

Getting Started Guide

Installation

First, ensure pandoc is on your PATH. (In other words, install pandoc and add it to your PATH.)

Then install pandadoc from PyPI:

$ python -m pip install pandadoc

That’s it.

Usage

Convert a webpage to markdown, and store it as a python str:

>>> import pandadoc
>>> input_url = "https://example.com/"
>>> example_md = pandadoc.call_pandoc(
...    options=["-t", "markdown"], files=[input_url]
... )
>>> print(example_md)
<div>

# Example Domain

This domain is for use in illustrative examples in documents.
...

Now convert the markdown to RTF, and write it to a file:

>>> rtf_output_file = "example.rtf"
>>> pandadoc.call_pandoc(
...     options=["-f", "markdown", "-t", "rtf", "-o", rtf_output_file],
...     input_text=example_md,
... )
''

Notice that call_pandoc returns an empty string '' when a file output is used. Looking at the output file:

{\pard \ql \f0 \sa180 \li0 \fi0 \outlinelevel0 \b \fs36 Example Domain\par}
{\pard \ql \f0 \sa180 \li0 \fi0 This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\par}
{\pard \ql \f0 \sa180 \li0 \fi0 {\field{\*\fldinst{HYPERLINK "https://www.iana.org/domains/example"}}{\fldrslt{\ul
More information...
}}}
\par}

Convert this RTF document to PDF, using xelatex with a custom character set, and store the result as raw bytes:

>>> raw_pdf = pandadoc.call_pandoc(
...     options=["-f", "markdown", "-t", "pdf", "--pdf-engine", "xelatex", "--variable-mainfont",  "Palatino"],
...     files=[rtf_output_file],
...     decode=False,
... )

Note that PDF conversion requires a “PDF engine” (e.g. pdflatex, latexmk etc.) to be installed.

Now you can send those raw bytes over a network, or write them to a file:

>>> with open("example.pdf", "wb") as f:
...     f.write(raw_pdf)
...
>>> # Finished

You can find more pandoc examples here.

Exceptions

If pandoc exits with an error, an appropriate exception is raised (based on the exit code):

>>> pandadoc.call_pandoc(
...     options=["-f", "markdown", "-t", "zzz"], # non-existent format
...     input_text=example_md,
... )
Traceback (most recent call last):
...
pandadoc.exceptions.PandocUnknownWriterError: Unknown output format zzz
>>> isinstance(pandadoc.exceptions.PandocUnknownWriterError(), pandadoc.PandocError)
True

You can find a full list of exceptions in the pandadoc.exceptions module.

Explanation

The pandoc command-line tool works like this:

pandoc [OPTIONS] [FILES]

In addition to the OPTIONS (documented here), you can provide either some FILES, or some input text (via stdin).

The call_pandoc function of pandadoc works in a similar way:

  • The options argument contains a list of pandoc options. E.g. ["-f", "markdown", "-t", "html"].

  • The files argument is a list of file paths (or absolute URIs). E.g. ["path/to/file.md", "https://www.fsf.org"]

  • The input_text argument is used as text input to pandoc. E.g. # Simple Doc\n\nA simple markdown document\n.

The timeout and decode arguments are used to control whether the pandoc process times out, and whether the result should be decoded to a str (True by default).

Bugs/Requests

Please use the GitHub issue tracker to submit bugs or request features.

Feedback is always appreciated.

License

Distributed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandadoc-0.1.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

pandadoc-0.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file pandadoc-0.1.0.tar.gz.

File metadata

  • Download URL: pandadoc-0.1.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pandadoc-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c81cb3d24ad0d123c1e10b63367a692e7a08311f8ce0a0fa89267d2a9e63d635
MD5 99093b292f95d659ff24cc1ba0a04c9b
BLAKE2b-256 f9f42db90d9f2718270db03d612689127981cb0165e76ab4802b44c6d8e70374

See more details on using hashes here.

File details

Details for the file pandadoc-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pandadoc-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for pandadoc-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f51083eeb6ce7ee858084f8d98bece2254b295023a380e3c9e5ef1f670429e5
MD5 cf7fd2d0bd2c0f5ef2282d07e08ace15
BLAKE2b-256 e57f733697ddc093b3185efd6ed3e440a2bc911b7fb13ab3a85522633e80f363

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page