PDF generation in python using wkhtmltopdf suitable for heroku
Project description
PDF generation in python using wkhtmltopdf.
Wkhtmltopdf binaries are precompiled and included in the package making pydf easier to use, in particular this means pydf works on heroku.
Currently using wkhtmltopdf 0.12.6.1 r3 for Ubuntu 22.04 (jammy), requires Python 3.6+.
If you’re not on Linux amd64: pydf comes bundled with a wkhtmltopdf binary which will only work on Linux amd64 architectures. If you’re on another OS or architecture your mileage may vary, it is likely that you’ll need to supply your own wkhtmltopdf binary and point pydf towards it by setting the WKHTMLTOPDF_PATH environment variable.
Install
pip install python-pdf
Basic Usage
import pydf
pdf = pydf.generate_pdf('<h1>this is html</h1>')
with open('test_doc.pdf', 'wb') as f:
f.write(pdf)
Async Usage
Generation of lots of documents with wkhtmltopdf can be slow as wkhtmltopdf can only generate one document per process. To get round this pydf uses python 3’s asyncio create_subprocess_exec to generate multiple pdfs at the same time. Thus the time taken to spin up processes doesn’t slow you down.
from pathlib import Path
from pydf import AsyncPydf
async def generate_async():
apydf = AsyncPydf()
async def gen(i):
pdf_content = await apydf.generate_pdf('<h1>this is html</h1>')
Path(f'output_{i:03}.pdf').write_bytes(pdf_content)
coros = [gen(i) for i in range(50)]
await asyncio.gather(*coros)
loop = asyncio.get_event_loop()
loop.run_until_complete(generate_async())
See benchmarks/run.py for a full example.
Locally generating an entire invoice goes from 0.372s/pdf to 0.035s/pdf with the async model.
Docker
pydf is available as a docker image with a very simple http API for generating pdfs.
Simple POST (or GET with data if possible) you HTML data to /generate.pdf.
Arguments can be passed using http headers; any header starting pdf- or pdf_ will have that prefix removed, be converted to lower case and passed to wkhtmltopdf.
For example:
docker run -rm -p 8000:80 -d samuelcolvin/pydf
curl -d '<h1>this is html</h1>' -H "pdf-orientation: landscape" http://localhost:8000/generate.pdf > created.pdf
open "created.pdf"
In docker compose:
services:
pdf:
image: samuelcolvin/pydf
Other services can then generate PDFs by making requests to pdf/generate.pdf. Pretty cool.
API
generate_pdf(source, [**kwargs])
Generate a pdf from either a url or a html string.
After the html and url arguments all other arguments are passed straight to wkhtmltopdf
For details on extra arguments see the output of get_help() and get_extended_help()
All arguments whether specified or caught with extra_kwargs are converted to command line args with '--' + original_name.replace('_', '-').
Arguments which are True are passed with no value eg. just –quiet, False and None arguments are missed, everything else is passed with str(value).
Arguments:
source: html string to generate pdf from or url to get
quiet: bool
grayscale: bool
lowquality: bool
margin_bottom: string eg. 10mm
margin_left: string eg. 10mm
margin_right: string eg. 10mm
margin_top: string eg. 10mm
orientation: Portrait or Landscape
page_height: string eg. 10mm
page_width: string eg. 10mm
page_size: string: A4, Letter, etc.
image_dpi: int default 600
image_quality: int default 94
extra_kwargs: any exotic extra options for wkhtmltopdf
Returns string representing pdf
get_version()
Get version of pydf and wkhtmltopdf binary
get_help()
get help string from wkhtmltopdf binary uses -h command line option
get_extended_help()
get extended help string from wkhtmltopdf binary uses -H command line option
execute_wk(*args)
Low level function to call wkhtmltopdf, arguments are added to wkhtmltopdf binary and passed to subprocess with not processing.
Heroku
If you are deploying onto Heroku, then you will need to install a couple of dependencies before WKHTMLTOPDF will work.
Add the Heroku buildpack https://buildpack-registry.s3.amazonaws.com/buildpacks/heroku-community/apt.tgz
Then create an Aptfile in your root directory with the dependencies:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file python-pdf-0.40.tar.gz
.
File metadata
- Download URL: python-pdf-0.40.tar.gz
- Upload date:
- Size: 17.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aaceddc7ef13e1db69b88a231db77a41068a618e9220088e730ce8c99eee30a5 |
|
MD5 | b593073ed4764aece2497af6f82d526f |
|
BLAKE2b-256 | fadaadd32ad09d608f43e805089d5af874eb2f3a74d52426d4dbc2aba4243c9a |
File details
Details for the file python_pdf-0.40-py3-none-any.whl
.
File metadata
- Download URL: python_pdf-0.40-py3-none-any.whl
- Upload date:
- Size: 17.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bfb72205f3be8e6529350b63a429902e4172bcd143c93e560eeae730f1ee0f4 |
|
MD5 | 1d281f25c514923746d4960bb294c26a |
|
BLAKE2b-256 | 14a90d4f3c58ff8b5477378e723aa8b53ae6a28ff9b69fd305cc2876a0ca3e19 |