Skip to main content

3-rd party plugin for markitdown library. It is to be used for converting a pdf to markdown purely based on llm's capability

Project description

markitdown-advanced-pdf-llm-plugin

Overview

markitdown-advanced-pdf-llm-plugin is a plugin for the MarkItDown library, specifically engineered for extracting the knowledge out of complex multi-modal PDF documents which is non-text heavy. This plugin addresses the challenges of reduced LLM output quality on large multi-modal documents by leveraging higher intelligence Large Language Models (LLMs) to interpret/extract knowledge out of these documents.

Why MarkItDown

MarkItDown is a lightweight Python utility for converting various files to Markdown for use with LLMs and related text analysis pipelines. Markdown is extremely close to plain text, with minimal markup or formatting, but still provides a way to represent important document structure. Mainstream LLMs, such as OpenAI's GPT-4o, natively "speak" Markdown, and often incorporate Markdown into their responses unprompted. This suggests that they have been trained on vast amounts of Markdown-formatted text, and understand it well. As a side benefit, Markdown conventions are also highly token-efficient.

Why markitdown-advanced-pdf-llm-plugin

  • Token efficiency: When involving Multi-Modal document in RAG, text only capabilities consume less token than multi-modal capabilities
  • RAG output quality: The quality of LLMs output degrades as input token increases. Passing several pages of multi-modal document at once can lead to poor LLM summarization than several pages of text documents
  • Latency: Text only input has lesser latency than multi-modal input

Example page from a document where plugin is beneficial

Screenshot 2025-05-04 184336

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markitdown_advanced_pdf_llm_plugin-0.1.0.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file markitdown_advanced_pdf_llm_plugin-0.1.0.tar.gz.

File metadata

File hashes

Hashes for markitdown_advanced_pdf_llm_plugin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6ecfcf8dac09c8b7c065f52577072f6cfcda129f2713bc664a024d75d6003962
MD5 a74a1637c41731bdf5caada1291aaf49
BLAKE2b-256 ccc34cbb4624557f8518a0717b95bc402c7d357e1994e7709d5b8d1f2986035f

See more details on using hashes here.

File details

Details for the file markitdown_advanced_pdf_llm_plugin-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for markitdown_advanced_pdf_llm_plugin-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 633a736bee2d9b943d808cef5171422167079f0fe6fb393957d1bc1d3ead737a
MD5 ac141e31439a729ad26ad2952c06262a
BLAKE2b-256 67a1c3c1ac9fb251cf797d9043fc2759cb39e2a6f0ab8e82f1ed9496edfea7f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page