Skip to main content

PDF generation in python using wkhtmltopdf suitable for heroku

Project description

BuildStatus codecov PyPI license docker

PDF generation in python using wkhtmltopdf.

Wkhtmltopdf binaries are precompiled and included in the package making pydf easier to use, in particular this means pydf works on heroku.

Currently using wkhtmltopdf 0.12.5 for Ubuntu 18.04 (bionic), requires Python 3.6+.

If you’re not on Linux amd64: pydf comes bundled with a wkhtmltopdf binary which will only work on Linux amd64 architectures. If you’re on another OS or architecture your mileage may vary, it is likely that you’ll need to supply your own wkhtmltopdf binary and point pydf towards it by setting the WKHTMLTOPDF_PATH environment variable.

Install

pip install python-pdf

For python 2 use pip install python-pdf==0.30.0.

Basic Usage

import pydf
pdf = pydf.generate_pdf('<h1>this is html</h1>')
with open('test_doc.pdf', 'wb') as f:
    f.write(pdf)

Async Usage

Generation of lots of documents with wkhtmltopdf can be slow as wkhtmltopdf can only generate one document per process. To get round this pydf uses python 3’s asyncio create_subprocess_exec to generate multiple pdfs at the same time. Thus the time taken to spin up processes doesn’t slow you down.

from pathlib import Path
from pydf import AsyncPydf

async def generate_async():
    apydf = AsyncPydf()

    async def gen(i):
        pdf_content = await apydf.generate_pdf('<h1>this is html</h1>')
        Path(f'output_{i:03}.pdf').write_bytes(pdf_content)

    coros = [gen(i) for i in range(50)]
    await asyncio.gather(*coros)

loop = asyncio.get_event_loop()
loop.run_until_complete(generate_async())

See benchmarks/run.py for a full example.

Locally generating an entire invoice goes from 0.372s/pdf to 0.035s/pdf with the async model.

Docker

pydf is available as a docker image with a very simple http API for generating pdfs.

Simple POST (or GET with data if possible) you HTML data to /generate.pdf.

Arguments can be passed using http headers; any header starting pdf- or pdf_ will have that prefix removed, be converted to lower case and passed to wkhtmltopdf.

For example:

docker run -rm -p 8000:80 -d samuelcolvin/pydf
curl -d '<h1>this is html</h1>' -H "pdf-orientation: landscape" http://localhost:8000/generate.pdf > created.pdf
open "created.pdf"

In docker compose:

services:
  pdf:
    image: samuelcolvin/pydf

Other services can then generate PDFs by making requests to pdf/generate.pdf. Pretty cool.

API

generate_pdf(source, [**kwargs])

Generate a pdf from either a url or a html string.

After the html and url arguments all other arguments are passed straight to wkhtmltopdf

For details on extra arguments see the output of get_help() and get_extended_help()

All arguments whether specified or caught with extra_kwargs are converted to command line args with '--' + original_name.replace('_', '-').

Arguments which are True are passed with no value eg. just –quiet, False and None arguments are missed, everything else is passed with str(value).

Arguments:

  • source: html string to generate pdf from or url to get

  • quiet: bool

  • grayscale: bool

  • lowquality: bool

  • margin_bottom: string eg. 10mm

  • margin_left: string eg. 10mm

  • margin_right: string eg. 10mm

  • margin_top: string eg. 10mm

  • orientation: Portrait or Landscape

  • page_height: string eg. 10mm

  • page_width: string eg. 10mm

  • page_size: string: A4, Letter, etc.

  • image_dpi: int default 600

  • image_quality: int default 94

  • extra_kwargs: any exotic extra options for wkhtmltopdf

Returns string representing pdf

get_version()

Get version of pydf and wkhtmltopdf binary

get_help()

get help string from wkhtmltopdf binary uses -h command line option

get_extended_help()

get extended help string from wkhtmltopdf binary uses -H command line option

execute_wk(*args)

Low level function to call wkhtmltopdf, arguments are added to wkhtmltopdf binary and passed to subprocess with not processing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-pdf-0.38.tar.gz (16.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_pdf-0.38-py36-none-any.whl (16.8 MB view details)

Uploaded Python 3.6

File details

Details for the file python-pdf-0.38.tar.gz.

File metadata

  • Download URL: python-pdf-0.38.tar.gz
  • Upload date:
  • Size: 16.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.10

File hashes

Hashes for python-pdf-0.38.tar.gz
Algorithm Hash digest
SHA256 8c61b777541f1076e9de8f28473f8954269815c3b8dd6858a7c8b3ce526162da
MD5 fdf8fa6e2d0f19c7d7c3af5156ccc230
BLAKE2b-256 e3bbdf62533365f4681036df50130e0f135035481f8eedd5c1dab1002c8a49e2

See more details on using hashes here.

File details

Details for the file python_pdf-0.38-py36-none-any.whl.

File metadata

  • Download URL: python_pdf-0.38-py36-none-any.whl
  • Upload date:
  • Size: 16.8 MB
  • Tags: Python 3.6
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.10

File hashes

Hashes for python_pdf-0.38-py36-none-any.whl
Algorithm Hash digest
SHA256 a6055955c93c0a7b6b9777df3dbe78532c6ec6fb972fe1c08fb4f003def173ad
MD5 05622e5d75d23132cc53b76cd0917db9
BLAKE2b-256 3197df5d4908adb44def64f7046f0fe70415b425e5a0d567114c43ab149154a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page