Skip to main content

Parse complex files (PDF,Docx,PPTX) for LLM consumption

Project description

MegaParse - Your Mega Parser for every type of documents

Quivr-logo

MegaParse is a powerful and versatile parser that can handle various types of documents with ease. Whether you're dealing with text, PDFs, Powerpoint presentations, Word documents MegaParse has got you covered. Focus on having no information loss during parsing.

Key Features 🎯

  • Versatile Parser: MegaParse is a powerful and versatile parser that can handle various types of documents with ease.
  • No Information Loss: Focus on having no information loss during parsing.
  • Fast and Efficient: Designed with speed and efficiency at its core.
  • Wide File Compatibility: Supports Text, PDF, Powerpoint presentations, Excel, CSV, Word documents.
  • Open Source: Freedom is beautiful, and so is MegaParse. Open source and free to use.

Support

  • Files: ✅ PDF ✅ Powerpoint ✅ Word
  • Content: ✅ Tables ✅ TOC ✅ Headers ✅ Footers ✅ Images

Example

Quivr-logo

Installation

pip install megaparse

Usage

  1. Create an account on Llama Cloud and get your API key.

  2. Create a new file in the root directory of the project and name it .env.

  3. Add the following line to the .env file and replace llx-your_api_key with your actual API key.

LLAMA_CLOUD_API_KEY=llx-your_api_key
  1. Now you can use the following code to convert a PDF to Markdown and save it to a file.
from megaparse import MegaParse

megaparse = MegaParse(file_path="./test.pdf")
content = megaparse.convert()
print(content)
megaparse.save_md(content, "./test.md")

Next Steps

  • Add Unstructured Parser Support
  • Improve Table Parsing
  • Improve Image Parsing and description
  • Add TOC for Docx
  • Add Hyperlinks for Docx
  • Order Headers for Docx to Markdown

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

megaparse-0.0.3.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

megaparse-0.0.3-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file megaparse-0.0.3.tar.gz.

File metadata

  • Download URL: megaparse-0.0.3.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for megaparse-0.0.3.tar.gz
Algorithm Hash digest
SHA256 e2229d83df90f6b1131f40ac9b9f5c87ef16f8dba557952bb5fbb49331dddd16
MD5 41a0b9ba36710b2667e59d1b7fe47d60
BLAKE2b-256 5f04d486fc73a1d4e4b84ff2f171f5a892a8e9e47172374b44980078b381e6c7

See more details on using hashes here.

File details

Details for the file megaparse-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: megaparse-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for megaparse-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 cd5bb3f17b41a496ec0aba29ff32d59d97d201be00125c76c59e896b72f7c744
MD5 5f86a598384989e3fff71f7186d18246
BLAKE2b-256 a12a2e36b1effadb5f45f021fce3093403e579155690a4a16acd5c3989c4620a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page