Skip to main content

Diselorya's PDF to Markdown tool especially for Obsidian.

Project description

pdf2md

Convert PDF to Markdown and TXT, especially for obsidian.

  • Get content:
    • Convert text pdf.
    • Convert picture pdf by ocr.
      • Higher OCR recognition accuracy.
    • Save pictures and insert to markdown by obsidian way.
  • Fix broken sentences. (most but not 100%)
    • Need to optimize based on more samples.
    • AI assisted recognition of sentence breaks.
  • Add headings:
    • Convert pdf bookmarks to headings.
    • Use page number as headings for picture pdf.
      • Fetch first sentence for page number headings.
      • Compare the headers, catalog, and page numbers to identify the levels of headings.
  • Filename handling:
    • Fix unsupported characters in filename.
    • Replace characters conflicting with obsidian in filename.
  • Character encoding problem handling:
    • Normalise the same character but different unicode, which can't read by TTS.
  • Batch convert.
  • Catalog: Replace catalog to obsidian way. (little significance)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diseloryapdf2md-0.0.1.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

diseloryapdf2md-0.0.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file diseloryapdf2md-0.0.1.tar.gz.

File metadata

  • Download URL: diseloryapdf2md-0.0.1.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for diseloryapdf2md-0.0.1.tar.gz
Algorithm Hash digest
SHA256 44bef8f2eec0ef0ea555776830fdcd0873ff35ca7b4d8b4d91df2e19c2e38cb5
MD5 8991e1c9d15f2ab1e7fc085f50c1ef58
BLAKE2b-256 b5d4faa72ee8902199a3b598a6de43deb53d6a1390f4c06cbb85314126466246

See more details on using hashes here.

File details

Details for the file diseloryapdf2md-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for diseloryapdf2md-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 07fee261eb23a1f7b860918da6f2975c0cfd06596b5a33afe12ce4aea1373629
MD5 0a757d4e7792008e8d06ab8802a95b7b
BLAKE2b-256 76de1cdbb9cb64f647c52d3c266398bda26c1dcf14f12f13133303fa721bd86f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page