Skip to main content

Convert PDFs to Markdown or plain text via font-aware heuristics

Project description

pdfto

Convert PDF files to Markdown or plain text from the command line.

Install

pip install pdfto

Usage

# Convert to Markdown (output alongside input)
pdfto --markdown example.pdf

# Convert to Markdown with explicit output path
pdfto --markdown example.pdf -o output.md

# Convert to plain text
pdfto --text example.pdf -o output.txt

# Write to stdout
pdfto --markdown example.pdf -o -

# Batch: multiple files
pdfto --markdown a.pdf b.pdf -o ./converted/

# Batch: entire directory (recursive)
pdfto --markdown ./docs/ -o ./out/

Options

Flag Description
--markdown Convert to Markdown (.md)
--text Convert to plain text (.txt)
-o / --output Output file, directory, or - for stdout
--force Overwrite existing output files
--quiet Suppress progress messages
--version Show version and exit

How it works

Uses PyMuPDF to extract per-span font metadata (size, bold, italic, monospace flags). Heading levels are assigned by font-size ratio relative to the document's body text size. No ML or GPU required.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfto-0.1.0.tar.gz (4.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfto-0.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file pdfto-0.1.0.tar.gz.

File metadata

  • Download URL: pdfto-0.1.0.tar.gz
  • Upload date:
  • Size: 4.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdfto-0.1.0.tar.gz
Algorithm Hash digest
SHA256 91091cb16046f98f43af8015e278cf2cd1c7f5018a4c38d7ef1ab337587ba936
MD5 ff1fd031f4658db0d976f5ffd9e86a36
BLAKE2b-256 f8d7843b14e9e33b6bb6b9af09f3f09c5d0e32ca1cc47a84dbb75fe5ec8e950a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdfto-0.1.0.tar.gz:

Publisher: workflow.yml on itsnotqwerty/pdfto

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pdfto-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pdfto-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdfto-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 405bc8af48873b57a1049eb84270d18a9bbd1789ceddccae45fb5ed175a8a4ff
MD5 51e8353c250ab013c49982f73f15a01d
BLAKE2b-256 0d97f30870fae99d94bba2f028e5e6f02238119788cdff784cc0cac26a2b4de0

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdfto-0.1.0-py3-none-any.whl:

Publisher: workflow.yml on itsnotqwerty/pdfto

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page