Skip to main content

Open source Python library converting PDF to docx, HTML, and Markdown.

Project description

pdf2any

python-version

PDF to DOCX, HTML, and Markdown converter — extract text, tables, and images from PDFs.

Features

  • Convert PDF to DOCX (Word documents with full formatting)
  • Convert PDF to HTML (preserves layout, tables and images)
  • Convert PDF to Markdown (clean, readable text with tables)
  • Preserve document structure: paragraphs, tables, images, text styling
  • Extract tables from PDFs
  • Multi-processing support for large documents
  • Command-line and Python API interfaces

Installation

pip install pdf2any

Quick Start

Command Line

# Convert PDF to DOCX
pdf2any convert input.pdf output.docx

# Convert PDF to HTML
pdf2any convert-html input.pdf output.html

# Convert PDF to Markdown (no page breaks)
pdf2any convert-md input.pdf output.md --nopage_break

# Convert specific pages
pdf2any convert input.pdf output.docx --pages=1,3,5

Python API

from pdf2any import Converter

# Convert to DOCX
cv = Converter("input.pdf")
cv.convert("output.docx")

# Convert to HTML (no page breaks)
cv.convert_html("output.html", page_break=False)

# Convert to Markdown
cv.convert_md("output.md", page_break=False)

# Extract tables
tables = cv.extract_tables()
cv.close()

Key Options

Option Description Default
--pages Specific pages to convert (e.g. 1,3,5) All
--nopage_break Remove page separators in output False
--remove_header_footer Remove headers and footers False
--multi_processing Enable parallel processing False

Documentation

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2any-0.5.13.tar.gz (18.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2any-0.5.13-py3-none-any.whl (129.5 kB view details)

Uploaded Python 3

File details

Details for the file pdf2any-0.5.13.tar.gz.

File metadata

  • Download URL: pdf2any-0.5.13.tar.gz
  • Upload date:
  • Size: 18.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for pdf2any-0.5.13.tar.gz
Algorithm Hash digest
SHA256 a5486e061d7db8711dbf35748129c1dc031c47c4a9ca6c8534b8ff616716f24f
MD5 37cf6fe79a0d67c7bcff4778db031429
BLAKE2b-256 224d2a40ea6c71d78507ecd5633acdf5620d838ddef5bfe2ac3d14221532b875

See more details on using hashes here.

File details

Details for the file pdf2any-0.5.13-py3-none-any.whl.

File metadata

  • Download URL: pdf2any-0.5.13-py3-none-any.whl
  • Upload date:
  • Size: 129.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for pdf2any-0.5.13-py3-none-any.whl
Algorithm Hash digest
SHA256 858ab001681440f42ed604b11043d9b563fbe95e8a2168fa0eab9458b33254f0
MD5 6a51ac24414b215ca16fb662eac72d81
BLAKE2b-256 ead6e5c0ecd7190f221bd3b14b1a992222b5e24dca922b5affa3b1e258a45c9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page