Diselorya's PDF to Markdown tool especially for Obsidian.
Project description
pdf2md
Convert PDF to Markdown and TXT, especially for obsidian.
- Get content:
- Convert text pdf.
- Convert picture pdf by ocr.
- Higher OCR recognition accuracy.
- Save pictures and insert to markdown by obsidian way.
- Fix broken sentences. (most but not 100%)
- Need to optimize based on more samples.
- AI assisted recognition of sentence breaks.
- Add headings:
- Convert pdf bookmarks to headings.
- Use page number as headings for picture pdf.
- Fetch first sentence for page number headings.
- Compare the headers, catalog, and page numbers to identify the levels of headings.
- Filename handling:
- Fix unsupported characters in filename.
- Replace characters conflicting with obsidian in filename.
- Character encoding problem handling:
- Normalise the same character but different unicode, which can't read by TTS.
- Batch convert.
- Catalog: Replace catalog to obsidian way. (little significance)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
diseloryapdf2md-0.0.1.tar.gz
(7.5 kB
view hashes)
Built Distribution
Close
Hashes for diseloryapdf2md-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07fee261eb23a1f7b860918da6f2975c0cfd06596b5a33afe12ce4aea1373629 |
|
MD5 | 0a757d4e7792008e8d06ab8802a95b7b |
|
BLAKE2b-256 | 76de1cdbb9cb64f647c52d3c266398bda26c1dcf14f12f13133303fa721bd86f |