Diselorya's PDF to Markdown tool especially for Obsidian.
Project description
pdf2md
Convert PDF to Markdown and TXT, especially for obsidian.
- Get content:
- Convert text pdf.
- Convert picture pdf by ocr.
- Higher OCR recognition accuracy.
- Save pictures and insert to markdown by obsidian way.
- Fix broken sentences. (most but not 100%)
- Need to optimize based on more samples.
- AI assisted recognition of sentence breaks.
- Add headings:
- Convert pdf bookmarks to headings.
- Use page number as headings for picture pdf.
- Fetch first sentence for page number headings.
- Compare the headers, catalog, and page numbers to identify the levels of headings.
- Filename handling:
- Fix unsupported characters in filename.
- Replace characters conflicting with obsidian in filename.
- Character encoding problem handling:
- Normalise the same character but different unicode, which can't read by TTS.
- Batch convert.
- Catalog: Replace catalog to obsidian way. (little significance)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
diseloryapdf2md-0.0.1.tar.gz
(7.5 kB
view details)
Built Distribution
File details
Details for the file diseloryapdf2md-0.0.1.tar.gz
.
File metadata
- Download URL: diseloryapdf2md-0.0.1.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44bef8f2eec0ef0ea555776830fdcd0873ff35ca7b4d8b4d91df2e19c2e38cb5 |
|
MD5 | 8991e1c9d15f2ab1e7fc085f50c1ef58 |
|
BLAKE2b-256 | b5d4faa72ee8902199a3b598a6de43deb53d6a1390f4c06cbb85314126466246 |
File details
Details for the file diseloryapdf2md-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: diseloryapdf2md-0.0.1-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07fee261eb23a1f7b860918da6f2975c0cfd06596b5a33afe12ce4aea1373629 |
|
MD5 | 0a757d4e7792008e8d06ab8802a95b7b |
|
BLAKE2b-256 | 76de1cdbb9cb64f647c52d3c266398bda26c1dcf14f12f13133303fa721bd86f |