PdfItDown - Convert Everything to PDF
Project description
PdfItDown
Convert Everything to PDF
PdfItDown is a python package that relies on markitdown by Microsoft, markdown_pdf and img2pdf. Visit us on our documentation website!
Applicability
PdfItDown is applicable to the following file formats:
- Markdown
- PowerPoint
- Word
- Excel
- HTML
- Text-based formats (CSV, XML, JSON)
- ZIP files (iterates over contents)
- Image files (PNG, JPG)
The format-specific support needs to be evaluated for the specific reader you are using.
How does it work?
PdfItDown works in a very simple way:
- From markdown to PDF (default)
graph LR
2(Input File) --> 3[Markdown content]
3[Markdown content] --> 4[markdown-pdf]
4[markdown-pdf] --> 5(PDF file)
- From image to PDF (default)
graph LR
2(Input File) --> 3[Bytes]
3[Bytes] --> 4[img2pdf]
4[img2pdf] --> 5(PDF file)
- From other text-based file formats or unstructured file formats to PDF (default)
graph LR
2(Input File) --> 3[MarkitDown]
3[MarkitDown] --> 4[Markdown content]
4[Markdown content] --> 5[markdown-pdf]
5[markdown-pdf] --> 6(PDF file)
- Using a custom conversion callback
graph LR
2(Input File) --> 3[Conversion Callback]
3[Conversion Callback] --> 4(PDF file)
Installation and Usage
To install PdfItDown, just run:
pip install pdfitdown
You can now use the command line tool:
Usage: pdfitdown [OPTIONS]
Convert (almost) everything to PDF
Options:
-i, --inputfile TEXT Path to the input file(s) that need to be converted
to PDF. Can be used multiple times.
-o, --outputfile TEXT Path to the output PDF file(s). If more than one
input file is provided, you should provide an equal
number of output files.
-t, --title TEXT Title to include in the PDF metadata. Default: 'File
Converted with PdfItDown'. If more than one file is
provided, it will be ignored.
-d, --directory TEXT Directory whose files you want to bulk-convert to
PDF. If `--inputfile` is also provided, this option
will be ignored. Defaults to None.
--help Show this message and exit.
An example usage can be:
pdfitdown -i README.md -o README.pdf -t "README"
Or you can use it inside your python scripts:
from pdfitdown.pdfconversion import Converter
converter = Converter()
converter.convert(file_path = "business_grow.md", output_path = "business_growth.pdf", title="Business Growth for Q3 in 2024")
converter.convert(file_path = "logo.png", output_path = "logo.pdf")
converter.convert(file_path = "users.xlsx", output_path = "users.pdf")
You can also convert multiple files at once:
- In the CLI:
# with custom output paths
pdfitdown -i test0.png -i test1.md -o testoutput0.pdf -o testoutput1.pdf
# with inferred output paths
pdfitdown -i test0.png -i test1.csv
- In the Python API:
from pdfitdown.pdfconversion import Converter
converter = Converter()
# with custom output paths
converter.multiple_convert(file_paths = ["business_growth.md", "logo.png"], output_paths = ["business_growth.pdf", "logo.pdf"])
# with inferred output paths
converter.multiple_convert(file_paths = ["business_growth.md", "logo.png"])
You can bulk-convert all the files in a directory:
- In the CLI:
pdfitdown -d tests/data/testdir
- In the Python API:
from pdfitdown.pdfconversion import Converter
converter = Converter()
output_paths = converter.convert_directory(directory_path = "tests/data/testdir")
print(output_paths)
In the python API you can also define a custom callback for the conversion. In this example, we use Google Gemini to summarize a file and save its content as a PDF:
from pathlib import Path
from pdfitdown.pdfconversion import Converter
from markdown_pdf import MarkdownPdf, Section
from google import genai
client = genai.Client()
def conversion_callback(input_file: str, output_file: str, title: str | None = None, overwrite: bool = True)
uploaded_file = client.files.upload(file=Path(input_file))
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=["Give me a summary of this file.", uploaded_file],
)
content = response.text
pdf = MarkdownPdf(toc_level=0)
pdf.add_section(Section(content))
pdf.meta["title"] = title or "Summary by Gemini"
pdf.save(output_file)
return output_fle
converter = Converter(conversion_callback=conversion_callback)
converter.convert(file_path = "business_growth.md", output_path = "business_growth.pdf", title="Business Growth for Q3 in 2024")
Moreover, the python API provides you with the possibility of mounting PdfItDown conversion features into a backend server built with Starlette and Starlette-compatible frameworks (such as FastAPI):
from starlette.applications import Starlette
from starlette.requests import Request
from startlette.responses import PlainTextResponse
from starlette.routing import Route
from pdfitdown.pdfconversion import Converter
from pdfitdown.server import mount
async def hello_world(request: Request) -> PlainTextResponse:
return PlainTextResponse(content="hello world!")
routes = Route("/helloworld", hello_world)
app = Starlette(routes=routes)
app = mount(app, converter=Converter(), path="/conversions/pdf", name="pdfitdown")
Now you can send file payloads to the /conversions/pdf endpoint through POST requests and get the content of the converted file back, in the response content:
import httpx
with open("file.txt", "rb") as f:
content = f.read()
files = {"file_upload": ("file.txt", content, "text/plain")}
with httpx.Client() as client:
response = client.post("http://localhost:80/conversions/pdf", files=files)
assert response.status_code == 200
with open("file.pdf", "wb") as f:
f.write(response.content)
Contributing
Contributions are always welcome!
Find contribution guidelines at CONTRIBUTING.md
License and Funding
This project is open-source and is provided under an MIT License.
If you found it useful, please consider funding it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdfitdown-2.0.3.tar.gz.
File metadata
- Download URL: pdfitdown-2.0.3.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95d599f59b566db807cc49e36dca1170c533da5913dea182bbbd8b3c9444316f
|
|
| MD5 |
070dcca95620a424d17d2ac79d144735
|
|
| BLAKE2b-256 |
554d4c51530c976007f25e919fd5f05f8cb1b844cdd43af9ae3e1545f96b5f33
|
File details
Details for the file pdfitdown-2.0.3-py3-none-any.whl.
File metadata
- Download URL: pdfitdown-2.0.3-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3fa1d841ada4a5eb753cc968433f98ff15618901e553d162f77c4deb1ed8cec8
|
|
| MD5 |
ae1a8c181e79e6fa9a2f0d5cc9647e8c
|
|
| BLAKE2b-256 |
5ebd57a5d09993aa2a69b39e6f250c9f4d6084a7c2b3cf5e83ec5711a660f58b
|