Parse complex files (PDF,Docx,PPTX) for LLM consumption
Project description
MegaParse - Your Mega Parser for every type of documents
MegaParse is a powerful and versatile parser that can handle various types of documents with ease. Whether you're dealing with text, PDFs, Powerpoint presentations, Word documents MegaParse has got you covered. Focus on having no information loss during parsing.
Key Features 🎯
- Versatile Parser: MegaParse is a powerful and versatile parser that can handle various types of documents with ease.
- No Information Loss: Focus on having no information loss during parsing.
- Fast and Efficient: Designed with speed and efficiency at its core.
- Wide File Compatibility: Supports Text, PDF, Powerpoint presentations, Excel, CSV, Word documents.
- Open Source: Freedom is beautiful, and so is MegaParse. Open source and free to use.
Support
- Files: ✅ PDF ✅ Powerpoint ✅ Word
- Content: ✅ Tables ✅ TOC ✅ Headers ✅ Footers ✅ Images
Example
Installation
pip install megaparse
Usage
-
Create an account on Llama Cloud and get your API key.
-
Create a new file in the root directory of the project and name it
.env
. -
Add the following line to the
.env
file and replacellx-your_api_key
with your actual API key.
LLAMA_CLOUD_API_KEY=llx-your_api_key
- Now you can use the following code to convert a PDF to Markdown and save it to a file.
from megaparse import MegaParse
megaparse = MegaParse(file_path="./test.pdf")
content = megaparse.convert()
print(content)
megaparse.save_md(content, "./test.md")
Next Steps
- Add Unstructured Parser Support
- Improve Table Parsing
- Improve Image Parsing and description
- Add TOC for Docx
- Add Hyperlinks for Docx
- Order Headers for Docx to Markdown
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file megaparse-0.0.3.tar.gz
.
File metadata
- Download URL: megaparse-0.0.3.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2229d83df90f6b1131f40ac9b9f5c87ef16f8dba557952bb5fbb49331dddd16 |
|
MD5 | 41a0b9ba36710b2667e59d1b7fe47d60 |
|
BLAKE2b-256 | 5f04d486fc73a1d4e4b84ff2f171f5a892a8e9e47172374b44980078b381e6c7 |
File details
Details for the file megaparse-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: megaparse-0.0.3-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd5bb3f17b41a496ec0aba29ff32d59d97d201be00125c76c59e896b72f7c744 |
|
MD5 | 5f86a598384989e3fff71f7186d18246 |
|
BLAKE2b-256 | a12a2e36b1effadb5f45f021fce3093403e579155690a4a16acd5c3989c4620a |