Skip to main content

MagicConvert is a Python library that converts various document formats (PDF, DOCX, XLSX, PPTX, HTML, Images) to markdown text. Features include OCR support, automatic format detection, and URL/file stream handling.

Project description

MagicConvert: The Ultimate File-to-Markdown Conversion Library

MagicConvert is a powerful and user-friendly Python library designed to convert various file formats into Markdown. Whether you're dealing with documents, images, web content, or spreadsheets, MagicConvert makes the process effortless. Equipped with built-in OCR (Optical Character Recognition), it can even extract text from images, making it an essential tool for developers, researchers, and anyone working with Markdown workflows. It’s especially helpful for LLM (Large Language Model) integrations!

MagicConvert Logo

✨ Why Choose MagicConvert?

MagicConvert is your go-to tool for file-to-Markdown conversion. Here’s what makes it special:

  1. Supports Multiple File Formats: Convert documents, images, spreadsheets, web pages, and more into Markdown.
  2. OCR Integration: Extract text from scanned images and documents using Tesseract OCR.
  3. Convert Web Content: Quickly transform URLs or HTML files into clean, readable Markdown.
  4. Markdown for AI & LLMs: Simplify content preparation for AI models using structured Markdown.
  5. Simple & Efficient: An intuitive API that makes file conversion a breeze.

🚀 Installation

Getting started is easy! Install MagicConvert using pip:

pip install MagicConvert

Note: For OCR functionality, make sure you have Tesseract OCR installed on your system.

Pypi Link: MagicConvert on Pypi


📚 Getting Started

1. Import and Initialize

Begin by importing MagicConvert and initializing the converter:

from MagicConvert import MagicConvert

converter = MagicConvert()

2. Convert Files to Markdown

MagicConvert supports various file types. Here are some examples:

Convert Word Documents

result = converter.magic("document.docx")
print(result.get_text)

Convert PowerPoint Presentations

result = converter.magic("presentation.pptx")
print(result.get_text)

Convert PDFs

result = converter.magic("document.pdf")
print(result.get_text)

Convert Images (OCR)

result = converter.magic("image.png")
print(result.get_text)

Convert Web Content (URLs)

result = converter.magic("https://example.com")
print(result.get_text)

Convert Plain Text Files

result = converter.magic("example.txt")
print(result.get_text)

Convert HTML Files

result = converter.convert_local("webpage.html")
print(result.get_text)

Convert Excel Files

result = converter.convert_local("spreadsheet.xlsx")
print(result.get_text)

Convert CSV Files

result = converter.convert_local("data.csv")
print(result.get_text)

📂 Supported File Formats

MagicConvert supports a wide range of file formats, making it a versatile tool for various needs:

Document Formats

  • Word Documents: .docx
  • PDF Files: .pdf
  • PowerPoint Presentations: .pptx
  • Excel Spreadsheets: .xlsx
  • CSV Files: .csv

Web Formats

  • HTML Files: .html, .htm
  • URLs: http://, https://

Image Formats

  • JPEG: .jpg, .jpeg
  • PNG: .png
  • TIFF: .tiff
  • BMP: .bmp

Text Formats

  • Plain Text: .txt

📅 Future Work

MagicConvert is constantly evolving. Here are some features planned for the future:

  1. Audio-to-Text Markdown: Convert audio files (e.g., .mp3, .wav) into Markdown by transcribing them with speech recognition.
  2. Video Subtitles to Markdown: Extract captions or subtitles from video files and convert them into Markdown.
  3. Advanced Formatting Options: Customizable Markdown output with styles like tables, headers, and inline code.
  4. Multi-language OCR Support: Enhanced text recognition for multiple languages.
  5. Cloud Integration: Save converted Markdown directly to cloud platforms like Google Drive, Dropbox, etc.
  6. Batch Conversion: Process multiple files simultaneously for large-scale projects.

Want to contribute ideas? Let us know!


👨‍💻 Contributing

MagicConvert is developed by Muhammad Noman, a student at Iqra University. Contributions, feedback, and bug reports are always welcome!

Here’s how you can get in touch or contribute:

If you enjoy using MagicConvert, feel free to ⭐️ the repository on GitHub and share it with others!


📃 License

MagicConvert is open-source and licensed under the MIT License. You are free to use, modify, and distribute the library as per the license terms.


💡 Summary

MagicConvert is the ultimate tool for converting files into Markdown, whether you’re preparing content for AI models, creating documentation, or simply working with Markdown-based workflows. Its ease of use, wide format support, and robust features make it an indispensable tool for developers, researchers, and content creators.

Try MagicConvert today and unlock the power of seamless file-to-Markdown conversion! 🚀

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page