Skip to main content

Thin wrapper for pandoc.

Project description

pypandoc

Build Status GitHub Releases Pypandoc PyPI Version Pypandoc Binary PyPI Version PyPandoc PyPi Downloads PyPandoc Binary PyPI Downloads conda version Development Status PyPandoc Python version PyPandoc Binary Python version License

Pypandoc provides a thin wrapper for pandoc, a universal document converter.

Installation

Pypandoc uses pandoc, so it needs an available installation of pandoc. Pypandoc provides 2 packages, "pypandoc" and "pypandoc_binary", with the second one including pandoc out of the box. The 2 packages are identical, with the only difference being that one includes pandoc, while the other don't.

If pandoc is already installed (i.e. pandoc is in the PATH), pypandoc uses the version with the higher version number, and if both are the same, the already installed version. See Specifying the location of pandoc binaries for more.

To use pandoc filters, you must have the relevant filters installed on your machine.

Installing via pip

If you want to install pandoc yourself or are on a unsupported platform, you'll need to install "pypandoc" and install pandoc manually

pip install pypandoc

If you want pandoc included out of the box, you can utilize our pypandoc_binary package, which are identical to the "pypandoc" package, but with pandoc included.

pip install pypandoc_binary

Prebuilt wheels for Windows and Mac OS X

If you use Linux and have your own wheelhouse, you can build a wheel which includes pandoc with uv build --wheel binary/. Be aware that this works only on 64bit intel systems, as we only download it from the official releases.

Installing via conda

Pypandoc is included in conda-forge. The conda packages will also install the pandoc package, so pandoc is available in the installation.

Install via conda install -c conda-forge pypandoc.

You can also add the channel to your conda config via conda config --add channels conda-forge. This makes it possible to use conda install pypandoc directly and also lets you update via conda update pypandoc.

Installing pandoc

If you don't already have pandoc on your system, or have installed the pypandoc_binary package, which includes pandoc, you need to install pandoc by yourself.

Installing pandoc via pypandoc

Installing via pypandoc is possible on Windows, Mac OS X or Linux (Intel-based, 64-bit):

pip install pypandoc
from pypandoc.pandoc_download import download_pandoc
# see the documentation how to customize the installation path
# but be aware that you then need to include it in the `PATH`
download_pandoc()

The default install location is included in the search path for pandoc, so you don't need to add it to the PATH.

By default, the latest pandoc version is installed. If you want to specify your own version, say 1.19.1, use download_pandoc(version='1.19.1') instead.

You can also use pypandocs build in cli to download pandoc

# install latest pandoc to default path
pypandoc download

# Download a specific version
pypandoc download --version 3.6

Installing pandoc manually

Installing manually via the system mechanism is also possible. Such installation mechanism make pandoc available on many more platforms:

  • Ubuntu/Debian: sudo apt-get install pandoc
  • Fedora/Red Hat: sudo yum install pandoc
  • Arch: sudo pacman -S pandoc
  • Mac OS X with Homebrew: brew install pandoc pandoc-citeproc Caskroom/cask/mactex
  • Machine with Haskell: cabal-install pandoc
  • Windows: There is an installer available here
  • FreeBSD with pkg: pkg install hs-pandoc
  • Or see Pandoc - Installing pandoc

Be aware that not all install mechanisms put pandoc in the PATH, so you either have to change the PATH yourself or set the full PATH to pandoc in PYPANDOC_PANDOC. See the next section for more information.

Specifying the location of pandoc binaries

You can point to a specific pandoc version by setting the environment variable PYPANDOC_PANDOC to the full PATH to the pandoc binary (PYPANDOC_PANDOC=/home/x/whatever/pandoc or PYPANDOC_PANDOC=c:\pandoc\pandoc.exe). If this environment variable is set, this is the only place where pandoc is searched for.

In certain cases, e.g. pandoc is installed but a web server with its own user cannot find the binaries, it is useful to specify the location at runtime:

import os
os.environ.setdefault('PYPANDOC_PANDOC', '/home/x/whatever/pandoc')

Usage

There are two basic ways to use pypandoc: with input files or with input strings.

import pypandoc

# With an input file: it will infer the input format from the filename
output = pypandoc.convert_file('somefile.md', 'rst')

# ...but you can overwrite the format via the `format` argument:
output = pypandoc.convert_file('somefile.txt', 'rst', format='md')

# alternatively you could just pass some string. In this case you need to
# define the input format:
output = pypandoc.convert_text('# some title', 'rst', format='md')
# output == 'some title\r\n==========\r\n\r\n'

convert_text expects this string to be unicode or utf-8 encoded bytes. convert_* will always return a unicode string.

It's also possible to directly let pandoc write the output to a file. This is the only way to convert to some output formats (e.g. odt, docx, epub, epub3, pdf). In that case convert_*() will return an empty string.

import pypandoc

output = pypandoc.convert_file('somefile.md', 'docx', outputfile="somefile.docx")
assert output == ""

It's also possible to specify multiple input files to pandoc, either as absolute paths, relative paths or file patterns.

import pypandoc

# convert all markdown files in a chapters/ subdirectory.
pypandoc.convert_file('chapters/*.md', 'docx', outputfile="somefile.docx")

# convert all markdown files in the book1 and book2 directories.
pypandoc.convert_file(['book1/*.md', 'book2/*.md'], 'docx', outputfile="somefile.docx")

# convert the front from another drive, and all markdown files in the chapter directory.
pypandoc.convert_file(['D:/book_front.md', 'book2/*.md'], 'docx', outputfile="somefile.docx")

pathlib is also supported.

import pypandoc
from pathlib import Path

# single file
input = Path('somefile.md')
output = input.with_suffix('.docx')
pypandoc.convert_file(input, 'docx', outputfile=output)

# convert all markdown files in a chapters/ subdirectory.
pypandoc.convert_file(Path('chapters').glob('*.md'), 'docx', outputfile="somefile.docx")

# convert all markdown files in the book1 and book2 directories.
pypandoc.convert_file([*Path('book1').glob('*.md'), *Path('book2').glob('*.md')], 'docx', outputfile="somefile.docx")
# pathlib globs must be unpacked if they are inside lists.

In addition to format, it is possible to pass extra_args. That makes it possible to access various pandoc options easily.

output = pypandoc.convert_text(
    '<h1>Primary Heading</h1>',
    'md', format='html',
    extra_args=['--atx-headers'])
# output == '# Primary Heading\r\n'
output = pypandoc.convert_text(
    '# Primary Heading',
    'html', format='md',
    extra_args=['--base-header-level=2'])
# output == '<h2 id="primary-heading">Primary Heading</h2>\r\n'

pypandoc now supports easy addition of pandoc filters.

filters = ['pandoc-citeproc']
pdoc_args = ['--mathjax',
             '--smart']
output = pypandoc.convert_file(filename,
                               to='html5',
                               format='md',
                               extra_args=pdoc_args,
                               filters=filters)

Please pass any filters in as a list and not as a string.

Please refer to pandoc -h and the official documentation for further details.

Dealing with Formatting Arguments

Pandoc supports custom formatting though -V parameter. In order to use it through pypandoc, use code such as this:

output = pypandoc.convert_file('demo.md', 'pdf', outputfile='demo.pdf',
  extra_args=['-V', 'geometry:margin=1.5cm'])

Note: it's important to separate -V and its argument within a list like that or else it won't work. This gotcha has to do with the way subprocess.Popen works.

PDF and LaTeX Support with TinyTeX

Converting to PDF requires a LaTeX engine (like pdflatex, xelatex, or lualatex) to be installed on your system. pypandoc integrates with pytinytex to make this seamless -- no manual LaTeX setup required.

Quick Start

pip install pypandoc[tinytex]

Then download TinyTeX (a lightweight LaTeX distribution) once:

pytinytex download

That's it. PDF conversion just works:

import pypandoc

pypandoc.convert_file('document.md', 'pdf', outputfile='document.pdf')

Automatic Package Installation

When converting to PDF or LaTeX, pypandoc will automatically:

  1. Add TinyTeX to the system PATH so pandoc can find the LaTeX engines
  2. If compilation fails due to a missing LaTeX package (e.g. booktabs.sty), install it via tlmgr and retry the conversion -- up to 3 attempts

This means you can start with a minimal TinyTeX installation and let pypandoc install only the LaTeX packages your documents actually need, on the fly.

Logging Messages

Pypandoc logs messages using the Python logging library. By default, it will send messages to the console, including any messages generated by Pandoc. If desired, this behaviour can be changed by adding handlers to the pypandoc logger before calling any functions. For example, to mute all logging add a null handler:

import logging
logging.getLogger('pypandoc').addHandler(logging.NullHandler())

Getting Pandoc Version

As it can be useful sometimes to check what pandoc version is available at your system or which particular pandoc binary is used by pypandoc. For that, pypandoc provides the following utility functions. Example:

print(pypandoc.get_pandoc_version())
print(pypandoc.get_pandoc_path())
print(pypandoc.get_pandoc_formats())

Command-Line Usage

Pypandoc includes a CLI that can be invoked via python -m pypandoc or, if installed with pip, the pypandoc command.

# Show pypandoc, pandoc and (if installed) pytinytex versions
pypandoc version

# Pass arguments through to the pandoc binary
pypandoc pandoc input.md -o output.html

# Download pandoc
pypandoc download
pypandoc download --version 3.6
pypandoc download --target /usr/local/bin

Related

  • pydocverter is a client for a service called Docverter, which offers pandoc as a service (plus some extra goodies).
  • See pyandoc for an alternative implementation of a pandoc wrapper from Kenneth Reitz. This one hasn't been active in a while though.
  • See panflute which provides convert_text similar to pypandoc's. Its focus is on writing and running pandoc filters though.

Contributing

Contributions are welcome. When opening a PR, please keep the following guidelines in mind:

  1. Before implementing, please open an issue for discussion.
  2. Make sure you have tests for the new logic.
  3. Make sure your code passes flake8 pypandoc/*.py tests/
  4. Add yourself to contributors at README.md unless you are already there. In that case tweak your contributions.

Note that for citeproc tests to pass you'll need to have pandoc-citeproc installed. If you installed a prebuilt wheel or conda package, it is already included.

Contributors

  • Jessica Tegner - New maintainer as of 1. Juli 2021
  • Valentin Haenel - String conversion fix
  • Daniel Sanchez - Automatic parsing of input/output formats
  • Thomas G. - Python 3 support
  • Ben Jao Ming - Fail gracefully if pandoc is missing
  • Ross Crawford-d'Heureuse - Encode input in UTF-8 and add Django example
  • Michael Chow - Decode output in UTF-8
  • Janusz Skonieczny - Support Windows newlines and allow encoding to be specified.
  • gabeos - Fix help parsing
  • Marc Abramowitz - Make setup.py fail hard if pandoc is missing, Travis, Dockerfile, PyPI badge, Tox, PEP-8, improved documentation
  • Daniel L. - Add extra_args example to README
  • Amy Guy - Exception handling for unicode errors
  • Florian Eßer - Allow Markdown extensions in output format
  • Philipp Wendler - Allow Markdown extensions in input format
  • Jan Katins - Handling output to a file, Travis to work on newer version of pandoc, return code checking, get_pandoc_version. Helped to fix the Travis build, new convert_* API. Former maintainer of pypandoc
  • Aaron Gonzales - Added better filter handling
  • David Lukes - Enabled input from non-plain-text files and made sure tests clean up template files correctly if they fail
  • valholl - Set up licensing information correctly and include examples to distribution version
  • Cyrille Rossant - Fixed bug by trimming out stars in the list of pandoc formats. Helped to fix the Travis build.
  • Paul Osborne - Don't require pandoc to install pypandoc.
  • Felix Yan - Added installation instructions for Arch Linux.
  • Kolen Cheung - Implement _get_pandoc_urls for installing arbitrary version as well as the latest version of pandoc. Minor: README, Travis, setup.py.
  • Rebecca Heineman - Added scanning code for finding pandoc in Windows
  • Andrew Barraford - Download destination.
  • Jesse Widner & Dominic Thorn - Add support for lua filters
  • Alex Kneisel - Added pathlib.Path support to convert_file.
  • Juho Vepsäläinen - Creator and former maintainer of pypandoc
  • Connor - Updated Dockerfile to Python 3.9 image and added docker compose file
  • Colin Bull - Added ability to control whether files are sorted before being passed to pandoc process.
  • Kurt McKee - Project infrastructure improvements

License

Pypandoc is available under MIT license. See LICENSE for more details. Pandoc itself is available under the GPL2 license.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pypandoc_binary-1.17-py3-none-win_amd64.whl (40.9 MB view details)

Uploaded Python 3Windows x86-64

pypandoc_binary-1.17-py3-none-musllinux_1_2_x86_64.whl (34.5 MB view details)

Uploaded Python 3musllinux: musl 1.2+ x86-64

pypandoc_binary-1.17-py3-none-musllinux_1_2_aarch64.whl (36.9 MB view details)

Uploaded Python 3musllinux: musl 1.2+ ARM64

pypandoc_binary-1.17-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl (36.9 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64manylinux: glibc 2.28+ ARM64

pypandoc_binary-1.17-py3-none-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl (34.5 MB view details)

Uploaded Python 3manylinux: glibc 2.28+ x86-64manylinux: glibc 2.5+ x86-64

pypandoc_binary-1.17-py3-none-macosx_11_0_arm64.whl (25.6 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

pypandoc_binary-1.17-py3-none-macosx_10_9_x86_64.whl (25.6 MB view details)

Uploaded Python 3macOS 10.9+ x86-64

File details

Details for the file pypandoc_binary-1.17-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for pypandoc_binary-1.17-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 76fae066cd2d7e78fb97f0ec8e9e36f437b07187b689b0b415ca18216f8f898a
MD5 44f7aac2cf34632dae099483e566a659
BLAKE2b-256 c6b9f47b77ba75ed5d47ec85fcc2ecfbf7f78e3a73347f3a09836634d930de98

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypandoc_binary-1.17-py3-none-win_amd64.whl:

Publisher: ci.yaml on JessicaTegner/pypandoc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pypandoc_binary-1.17-py3-none-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for pypandoc_binary-1.17-py3-none-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 f6e6d3e4cfafbe23189a08db3d41f8def260bacd6e7e382bceadab7ba1f17da6
MD5 8b33310a0dd35f6c91d55b2f91b886c6
BLAKE2b-256 0d2d6a51cd4e54bdf132c19416801077c34bd40ba182e85d843360d36ae03a2d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypandoc_binary-1.17-py3-none-musllinux_1_2_x86_64.whl:

Publisher: ci.yaml on JessicaTegner/pypandoc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pypandoc_binary-1.17-py3-none-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for pypandoc_binary-1.17-py3-none-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 2f439dcd211183bb3460253ca4511101df6e1acf4a01f45f5617e1fa2ad24279
MD5 825e98d28c8a54ed5e3d040d34c4f3e0
BLAKE2b-256 3b31a5a867159c4080e5d368f4a53540a727501a2f31affc297dc8e0fced96a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypandoc_binary-1.17-py3-none-musllinux_1_2_aarch64.whl:

Publisher: ci.yaml on JessicaTegner/pypandoc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pypandoc_binary-1.17-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for pypandoc_binary-1.17-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9ada156cb980cd54fd6534231788e668c00dbb591cbd24f0be0bd86812eb8788
MD5 d6ef6ed5016db7479b9f212214e44a5f
BLAKE2b-256 8d7f1e5612b52900ebe590862dabeadf546f739b27527dcd8bfd632f8adac1be

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypandoc_binary-1.17-py3-none-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl:

Publisher: ci.yaml on JessicaTegner/pypandoc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pypandoc_binary-1.17-py3-none-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl.

File metadata

File hashes

Hashes for pypandoc_binary-1.17-py3-none-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl
Algorithm Hash digest
SHA256 d6b620b21c9374e3e48aabd518492bf0776b148442ee28816f6aaf52da3d4387
MD5 8c637a78cda472499e9e0a2f67a52a77
BLAKE2b-256 f427ac1078239aae14b94c51975b7f46ad8e099e47d7ae26c175a5486b1c0099

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypandoc_binary-1.17-py3-none-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl:

Publisher: ci.yaml on JessicaTegner/pypandoc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pypandoc_binary-1.17-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pypandoc_binary-1.17-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fcfd28f347ed998dda28823fc6bc24f9310e7fdf3ddceaf925bf0563a100ab5b
MD5 a57a7edec51b7776c839392c160a4a93
BLAKE2b-256 15588fd107c68522957868c1e785fbea7595608df118e440e424d189668294df

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypandoc_binary-1.17-py3-none-macosx_11_0_arm64.whl:

Publisher: ci.yaml on JessicaTegner/pypandoc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pypandoc_binary-1.17-py3-none-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for pypandoc_binary-1.17-py3-none-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 734726dc618ef276343e272e1a6b4567e59c2ef9ef41d5533042deac3b0531f1
MD5 dc6a4e170c584e857f44acd3f0b9e16c
BLAKE2b-256 8085681a54111f0948821a5cf87ce30a88bb0a3f6848af5112c912abac4a2b77

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypandoc_binary-1.17-py3-none-macosx_10_9_x86_64.whl:

Publisher: ci.yaml on JessicaTegner/pypandoc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page