Skip to main content

PdfItDown - Convert Everything to PDF

Project description

PdfItDown

Convert Everything to PDF


Join Discord Server

PdfItDown Logo

PdfItDown is a python package that relies on markitdown by Microsoft, markdown_pdf and img2pdf. Visit us on our documentation website!

Applicability

PdfItDown is applicable to the following file formats:

  • Markdown
  • PowerPoint
  • Word
  • Excel
  • HTML
  • Text-based formats (CSV, XML, JSON)
  • ZIP files (iterates over contents)
  • Image files (PNG, JPG)

The format-specific support needs to be evaluated for the specific reader you are using.

How does it work?

PdfItDown works in a very simple way:

  • From markdown to PDF (default)
graph LR
2(Input File) --> 3[Markdown content]
3[Markdown content] --> 4[markdown-pdf]
4[markdown-pdf] --> 5(PDF file)
  • From image to PDF (default)
graph LR
2(Input File) --> 3[Bytes]
3[Bytes] --> 4[img2pdf]
4[img2pdf] --> 5(PDF file)
  • From other text-based file formats or unstructured file formats to PDF (default)
graph LR
2(Input File) -->  3[MarkitDown]
3[MarkitDown] -->  4[Markdown content]
4[Markdown content] --> 5[markdown-pdf]
5[markdown-pdf] --> 6(PDF file)
  • Using a custom conversion callback
graph LR
2(Input File) -->  3[Conversion Callback]
3[Conversion Callback] --> 4(PDF file)

Installation and Usage

To install PdfItDown, just run:

pip install pdfitdown

You can now use the command line tool:

Usage: pdfitdown [OPTIONS]

  Convert (almost) everything to PDF

Options:
  -i, --inputfile TEXT   Path to the input file(s) that need to be converted
                         to PDF. Can be used multiple times.
  -o, --outputfile TEXT  Path to the output PDF file(s). If more than one
                         input file is provided, you should provide an equal
                         number of output files.
  -t, --title TEXT       Title to include in the PDF metadata. Default: 'File
                         Converted with PdfItDown'. If more than one file is
                         provided, it will be ignored.
  -d, --directory TEXT   Directory whose files you want to bulk-convert to
                         PDF. If `--inputfile` is also provided, this option
                         will be ignored. Defaults to None.
  --help                 Show this message and exit.

An example usage can be:

pdfitdown -i README.md -o README.pdf -t "README"

Or you can use it inside your python scripts:

from pdfitdown.pdfconversion import Converter

converter = Converter()
converter.convert(file_path = "business_grow.md", output_path = "business_growth.pdf", title="Business Growth for Q3 in 2024")
converter.convert(file_path = "logo.png", output_path = "logo.pdf")
converter.convert(file_path = "users.xlsx", output_path = "users.pdf")

You can also convert multiple files at once:

  • In the CLI:
# with custom output paths
pdfitdown -i test0.png -i test1.md -o testoutput0.pdf -o testoutput1.pdf
# with inferred output paths
pdfitdown -i test0.png -i test1.csv
  • In the Python API:
from pdfitdown.pdfconversion import Converter

converter = Converter()
# with custom output paths
converter.multiple_convert(file_paths = ["business_growth.md", "logo.png"], output_paths = ["business_growth.pdf", "logo.pdf"])
# with inferred output paths
converter.multiple_convert(file_paths = ["business_growth.md", "logo.png"])

You can bulk-convert all the files in a directory:

  • In the CLI:
pdfitdown -d tests/data/testdir
  • In the Python API:
from pdfitdown.pdfconversion import Converter

converter = Converter()
output_paths = converter.convert_directory(directory_path = "tests/data/testdir")
print(output_paths)

In the python API you can also define a custom callback for the conversion. In this example, we use Google Gemini to summarize a file and save its content as a PDF:

from pathlib import Path
from pdfitdown.pdfconversion import Converter
from markdown_pdf import MarkdownPdf, Section
from google import genai

client = genai.Client()

def conversion_callback(input_file: str, output_file: str, title: str | None = None, overwrite: bool = True)
    uploaded_file = client.files.upload(file=Path(input_file))
    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents=["Give me a summary of this file.", uploaded_file],
    )
    content = response.text
    pdf = MarkdownPdf(toc_level=0)
    pdf.add_section(Section(content))
    pdf.meta["title"] = title or "Summary by Gemini"
    pdf.save(output_file)
    return output_fle

converter = Converter(conversion_callback=conversion_callback)
converter.convert(file_path = "business_growth.md", output_path = "business_growth.pdf", title="Business Growth for Q3 in 2024")

Moreover, the python API provides you with the possibility of mounting PdfItDown conversion features into a backend server built with Starlette and Starlette-compatible frameworks (such as FastAPI):

from starlette.applications import Starlette
from starlette.requests import Request
from startlette.responses import PlainTextResponse
from starlette.routing import Route
from pdfitdown.pdfconversion import Converter
from pdfitdown.server import mount

async def hello_world(request: Request) -> PlainTextResponse:
    return PlainTextResponse(content="hello world!")

routes = Route("/helloworld", hello_world)
app = Starlette(routes=routes)

app = mount(app, converter=Converter(), path="/conversions/pdf", name="pdfitdown")

Now you can send file payloads to the /conversions/pdf endpoint through POST requests and get the content of the converted file back, in the response content:

import httpx

with open("file.txt", "rb") as f:
    content = f.read()

files = {"file_upload": ("file.txt", content, "text/plain")}

with httpx.Client() as client:
    response = client.post("http://localhost:80/conversions/pdf", files=files)

    assert response.status_code == 200
    with open("file.pdf", "wb") as f:
        f.write(response.content)

Contributing

Contributions are always welcome!

Find contribution guidelines at CONTRIBUTING.md

License and Funding

This project is open-source and is provided under an MIT License.

If you found it useful, please consider funding it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfitdown-2.0.3.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfitdown-2.0.3-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file pdfitdown-2.0.3.tar.gz.

File metadata

  • Download URL: pdfitdown-2.0.3.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pdfitdown-2.0.3.tar.gz
Algorithm Hash digest
SHA256 95d599f59b566db807cc49e36dca1170c533da5913dea182bbbd8b3c9444316f
MD5 070dcca95620a424d17d2ac79d144735
BLAKE2b-256 554d4c51530c976007f25e919fd5f05f8cb1b844cdd43af9ae3e1545f96b5f33

See more details on using hashes here.

File details

Details for the file pdfitdown-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: pdfitdown-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pdfitdown-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3fa1d841ada4a5eb753cc968433f98ff15618901e553d162f77c4deb1ed8cec8
MD5 ae1a8c181e79e6fa9a2f0d5cc9647e8c
BLAKE2b-256 5ebd57a5d09993aa2a69b39e6f250c9f4d6084a7c2b3cf5e83ec5711a660f58b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page