Open source Python library converting PDF to docx, HTML, and Markdown.
Project description
pdf2any
PDF to DOCX, HTML, and Markdown converter — extract text, tables, and images from PDFs.
Features
- Convert PDF to DOCX (Word documents with full formatting)
- Convert PDF to HTML (preserves layout, tables and images)
- Convert PDF to Markdown (clean, readable text with tables)
- Preserve document structure: paragraphs, tables, images, text styling
- Extract tables from PDFs
- Multi-processing support for large documents
- Command-line and Python API interfaces
Installation
pip install pdf2any
Quick Start
Command Line
# Convert PDF to DOCX
pdf2any convert input.pdf output.docx
# Convert PDF to HTML
pdf2any convert-html input.pdf output.html
# Convert PDF to Markdown (no page breaks)
pdf2any convert-md input.pdf output.md --nopage_break
# Convert specific pages
pdf2any convert input.pdf output.docx --pages=1,3,5
Python API
from pdf2any import Converter
# Convert to DOCX
cv = Converter("input.pdf")
cv.convert("output.docx")
# Convert to HTML (no page breaks)
cv.convert_html("output.html", page_break=False)
# Convert to Markdown
cv.convert_md("output.md", page_break=False)
# Extract tables
tables = cv.extract_tables()
cv.close()
Key Options
| Option | Description | Default |
|---|---|---|
--pages |
Specific pages to convert (e.g. 1,3,5) |
All |
--nopage_break |
Remove page separators in output | False |
--remove_header_footer |
Remove headers and footers | False |
--multi_processing |
Enable parallel processing | False |
Documentation
License
MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdf2any-0.5.13.tar.gz
(18.3 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
pdf2any-0.5.13-py3-none-any.whl
(129.5 kB
view details)
File details
Details for the file pdf2any-0.5.13.tar.gz.
File metadata
- Download URL: pdf2any-0.5.13.tar.gz
- Upload date:
- Size: 18.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5486e061d7db8711dbf35748129c1dc031c47c4a9ca6c8534b8ff616716f24f
|
|
| MD5 |
37cf6fe79a0d67c7bcff4778db031429
|
|
| BLAKE2b-256 |
224d2a40ea6c71d78507ecd5633acdf5620d838ddef5bfe2ac3d14221532b875
|
File details
Details for the file pdf2any-0.5.13-py3-none-any.whl.
File metadata
- Download URL: pdf2any-0.5.13-py3-none-any.whl
- Upload date:
- Size: 129.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
858ab001681440f42ed604b11043d9b563fbe95e8a2168fa0eab9458b33254f0
|
|
| MD5 |
6a51ac24414b215ca16fb662eac72d81
|
|
| BLAKE2b-256 |
ead6e5c0ecd7190f221bd3b14b1a992222b5e24dca922b5affa3b1e258a45c9c
|