Skip to main content

Formatting-preserving PDF-to-DOCX converter that fixes bullet lists, hyperlinks, CJK fonts, and scanned PDFs

Project description

pdf2docx-healer

Formatting-preserving PDF-to-DOCX converter that fixes:

  • Bullet and numbered lists (proper Word list styles)
  • Hyperlinks (extracted from PDF annotations)
  • CJK and unavailable font fallback
  • Scanned/image-based PDFs (via OCR)

Usage

from docx_healer import heal

heal("input.pdf", "output.docx")
pdf2docx-heal input.pdf -o output.docx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2docx_healer-0.1.2.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2docx_healer-0.1.2-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file pdf2docx_healer-0.1.2.tar.gz.

File metadata

  • Download URL: pdf2docx_healer-0.1.2.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pdf2docx_healer-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5ad2027229af2abec35e38005cadca2e86c6d04048eeca3f3c039a6d4c874971
MD5 72ad1e08fcfaf31590e764227f25370d
BLAKE2b-256 3f4fe7a66b2a15dc1ad77568d333f35ad3c46ded4af61531e2722da5a9d17562

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf2docx_healer-0.1.2.tar.gz:

Publisher: publish.yml on krockxz/pdf2docx-healer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pdf2docx_healer-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pdf2docx_healer-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pdf2docx_healer-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1075a22323de3ed6bbbd0317c1c301e4e71e47a932072b1ecc640d9b1552cb2a
MD5 10f9e6a9ab5167ce4b6aa088ecb4f703
BLAKE2b-256 704dd3e48e9b6d905c33162ad92c3374cab464e51fa3427f9cbe977da00eb052

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf2docx_healer-0.1.2-py3-none-any.whl:

Publisher: publish.yml on krockxz/pdf2docx-healer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page