Skip to main content

CLI/TUI tool to run OCR locally and overlay a searchable text layer on PDFs.

Project description

pdfembed

PyPI Downloads

Image

CLI/TUI tool to run OCR locally and overlay a searchable text layer on PDFs. By default, a Textual-based TUI launches; use --cli for the classic CLI.

License: BSD-3-Clause (see LICENSE).

Quickstart

  • TUI (default):
    python -m pdfembed.cli or pdfembed

  • CLI:
    python -m pdfembed.cli --cli --file sample.pdf --dpi 300

TUI Controls

  • f: select PDF file(s) (opens a file dialog; multiple selection allowed)
  • o: select output folder (opens a folder dialog; defaults to the first PDF's directory)
  • v: toggle overlay visibility (debug)
  • s: start OCR
  • q: quit
  • DPI is fixed to the default in TUI; change via CLI --dpi if needed.

While OCR is running, a "Processing... please wait" indicator is shown and other keys are ignored until completion.

CLI Options (key ones)

  • --file <pdf1> [pdf2 ...] or --dir <folder>: input PDFs
  • --output <dir>: output directory (default: input location)
  • --dpi <int>: render DPI (default 300)
  • --visible: make overlay text visible (debug)
  • --font <path>: TTF font for overlay text
  • --log-level <LEVEL>: logging level (INFO by default)

Dependencies

  • Textual (TUI)
  • tkinter (file dialogs, stdlib)
  • onnxocr / pypdfium2 / pypdf / reportlab / opencv-python / numpy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfembed-0.1.4.tar.gz (4.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfembed-0.1.4-py3-none-any.whl (4.2 MB view details)

Uploaded Python 3

File details

Details for the file pdfembed-0.1.4.tar.gz.

File metadata

  • Download URL: pdfembed-0.1.4.tar.gz
  • Upload date:
  • Size: 4.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.4

File hashes

Hashes for pdfembed-0.1.4.tar.gz
Algorithm Hash digest
SHA256 19e1a6f9fda883e2b6e0092b8376b475fffa66184764d1c4622537b925096503
MD5 351001d52656e88345709304fee74bf1
BLAKE2b-256 7b97286046b666c05f45d9cbd8e6c01126e9c36387d5e6c6327c40278b9b85cb

See more details on using hashes here.

File details

Details for the file pdfembed-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: pdfembed-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 4.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.4

File hashes

Hashes for pdfembed-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 dd63211252d72224d5af92ecbf39174c28f071e8efcdf35ffb4e2190a0d9a90f
MD5 3770b4f3175b97ed00336f10cabb9b49
BLAKE2b-256 a7740fb678e969a3873e454552afc4b6b735a62d5def4982a665a08de8a6d6e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page