3-rd party plugin for markitdown library. It is to be used for converting a pdf to markdown purely based on llm's capability
Project description
markitdown-advanced-pdf-llm-plugin
Overview
markitdown-advanced-pdf-llm-plugin is a plugin for the MarkItDown library, specifically engineered for extracting the knowledge out of complex multi-modal PDF documents which is non-text heavy. This plugin addresses the challenges of reduced LLM output quality on large multi-modal documents by leveraging higher intelligence Large Language Models (LLMs) to interpret/extract knowledge out of these documents.
Why MarkItDown
MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Markdown is extremely close to plain text, with minimal markup or formatting, but still provides a way to represent important document structure. Mainstream LLMs, such as OpenAI's GPT-4o, natively "speak" Markdown, and often incorporate Markdown into their responses unprompted. This suggests that they have been trained on vast amounts of Markdown-formatted text, and understand it well. As a side benefit, Markdown conventions are also highly token-efficient.
Why markitdown-advanced-pdf-llm-plugin
- Token efficiency: When involving Multi-Modal document in RAG, text only capabilities consume less token than multi-modal capabilities
- RAG output quality: The quality of LLMs output degrades as input token increases. Passing several pages of multi-modal document at once can lead to poor LLM summarization than several pages of text documents
- Latency: Text only input has lesser latency than multi-modal input
Example page from a document where plugin is beneficial
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file markitdown_advanced_pdf_llm_plugin-0.1.0.tar.gz.
File metadata
- Download URL: markitdown_advanced_pdf_llm_plugin-0.1.0.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ecfcf8dac09c8b7c065f52577072f6cfcda129f2713bc664a024d75d6003962
|
|
| MD5 |
a74a1637c41731bdf5caada1291aaf49
|
|
| BLAKE2b-256 |
ccc34cbb4624557f8518a0717b95bc402c7d357e1994e7709d5b8d1f2986035f
|
File details
Details for the file markitdown_advanced_pdf_llm_plugin-0.1.0-py3-none-any.whl.
File metadata
- Download URL: markitdown_advanced_pdf_llm_plugin-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
633a736bee2d9b943d808cef5171422167079f0fe6fb393957d1bc1d3ead737a
|
|
| MD5 |
ac141e31439a729ad26ad2952c06262a
|
|
| BLAKE2b-256 |
67a1c3c1ac9fb251cf797d9043fc2759cb39e2a6f0ab8e82f1ed9496edfea7f9
|