Skip to main content

Diselorya's PDF to Markdown tool especially for Obsidian.

Project description

pdf2md

Convert PDF to Markdown and TXT, especially for obsidian.

  • Get content:
    • Convert text pdf.
    • Convert picture pdf by ocr.
      • Higher OCR recognition accuracy.
    • Save pictures and insert to markdown by obsidian way.
  • Fix broken sentences. (most but not 100%)
    • Need to optimize based on more samples.
    • AI assisted recognition of sentence breaks.
  • Add headings:
    • Convert pdf bookmarks to headings.
    • Use page number as headings for picture pdf.
      • Fetch first sentence for page number headings.
      • Compare the headers, catalog, and page numbers to identify the levels of headings.
  • Filename handling:
    • Fix unsupported characters in filename.
    • Replace characters conflicting with obsidian in filename.
  • Character encoding problem handling:
    • Normalise the same character but different unicode, which can't read by TTS.
  • Batch convert.
  • Catalog: Replace catalog to obsidian way. (little significance)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diseloryapdf2md-0.0.1.tar.gz (7.5 kB view hashes)

Uploaded Source

Built Distribution

diseloryapdf2md-0.0.1-py3-none-any.whl (7.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page