Skip to main content

MagicConvert is a Python library that converts various document formats (PDF, DOCX, XLSX, PPTX, HTML, Images) to markdown text. Features include OCR support, automatic format detection, and URL/file stream handling.

Project description

MagicConvert: The Ultimate File-to-Markdown Conversion Library

MagicConvert is a powerful and user-friendly Python library designed to convert various file formats into Markdown. Whether you're dealing with documents, images, web content, or spreadsheets, MagicConvert makes the process effortless. Equipped with built-in OCR (Optical Character Recognition), it can even extract text from images, making it an essential tool for developers, researchers, and anyone working with Markdown workflows. It’s especially helpful for LLM (Large Language Model) integrations!

MagicConvert Logo

✨ Why Choose MagicConvert?

MagicConvert is your go-to tool for file-to-Markdown conversion. Here’s what makes it special:

  1. Supports Multiple File Formats: Convert documents, images, spreadsheets, web pages, and more into Markdown.
  2. OCR Integration: Extract text from scanned images and documents using Tesseract OCR.
  3. Convert Web Content: Quickly transform URLs or HTML files into clean, readable Markdown.
  4. Markdown for AI & LLMs: Simplify content preparation for AI models using structured Markdown.
  5. Simple & Efficient: An intuitive API that makes file conversion a breeze.

🚀 Installation

Getting started is easy! Install MagicConvert using pip:

pip install MagicConvert

Note: For OCR functionality, make sure you have Tesseract OCR installed on your system.

Pypi Link: MagicConvert on Pypi


📚 Getting Started

1. Import and Initialize

Begin by importing MagicConvert and initializing the converter:

from MagicConvert import MagicConvert

converter = MagicConvert()

2. Convert Files to Markdown

MagicConvert supports various file types. Here are some examples:

Convert Word Documents

result = converter.magic("document.docx")
print(result.text_content)

Convert PowerPoint Presentations

result = converter.magic("presentation.pptx")
print(result.text_content)

Convert PDFs

result = converter.magic("document.pdf")
print(result.text_content)

Convert Images (OCR)

result = converter.magic("image.png")
print(result.text_content)

Convert Web Content (URLs)

result = converter.magic("https://example.com")
print(result.text_content)

Convert Plain Text Files

result = converter.magic("example.txt")
print(result.text_content)

Convert HTML Files

result = converter.magic("webpage.html")
print(result.text_content)

Convert Excel Files

result = converter.magic("spreadsheet.xlsx")
print(result.text_content)

Convert CSV Files

result = converter.magic("data.csv")
print(result.text_content)

📂 Supported File Formats

MagicConvert supports a wide range of file formats, making it a versatile tool for various needs:

Document Formats

  • Word Documents: .docx
  • PDF Files: .pdf
  • PowerPoint Presentations: .pptx
  • Excel Spreadsheets: .xlsx
  • CSV Files: .csv

Web Formats

  • HTML Files: .html, .htm
  • URLs: http://, https://

Image Formats

  • JPEG: .jpg, .jpeg
  • PNG: .png
  • TIFF: .tiff
  • BMP: .bmp

Text Formats

  • Plain Text: .txt

📅 Future Work

MagicConvert is constantly evolving. Here are some features planned for the future:

  1. Audio-to-Text Markdown: Convert audio files (e.g., .mp3, .wav) into Markdown by transcribing them with speech recognition.
  2. Video Subtitles to Markdown: Extract captions or subtitles from video files and convert them into Markdown.
  3. Advanced Formatting Options: Customizable Markdown output with styles like tables, headers, and inline code.
  4. Multi-language OCR Support: Enhanced text recognition for multiple languages.
  5. Cloud Integration: Save converted Markdown directly to cloud platforms like Google Drive, Dropbox, etc.
  6. Batch Conversion: Process multiple files simultaneously for large-scale projects.

Want to contribute ideas? Let us know!


👨‍💻 Contributing

MagicConvert is developed by Muhammad Noman, a student at Iqra University. Contributions, feedback, and bug reports are always welcome!

Here’s how you can get in touch or contribute:

If you enjoy using MagicConvert, feel free to ⭐️ the repository on GitHub and share it with others!


📃 License

MagicConvert is open-source and licensed under the MIT License. You are free to use, modify, and distribute the library as per the license terms.


💡 Summary

MagicConvert is the ultimate tool for converting files into Markdown, whether you’re preparing content for AI models, creating documentation, or simply working with Markdown-based workflows. Its ease of use, wide format support, and robust features make it an indispensable tool for developers, researchers, and content creators.

Try MagicConvert today and unlock the power of seamless file-to-Markdown conversion! 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magicconvert-0.1.3.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

magicconvert-0.1.3-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file magicconvert-0.1.3.tar.gz.

File metadata

  • Download URL: magicconvert-0.1.3.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for magicconvert-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ea2f2fbe27acc6a2bc3c66453fede6f8d54d80a4389e0ec480ec9a2a6225e83c
MD5 4dd7f181ec652560ca1afad6a9808ff0
BLAKE2b-256 e38a371672567a0c5ba20b91301ab31339b2bbbe12c8c2487968f7e8444efbab

See more details on using hashes here.

File details

Details for the file magicconvert-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: magicconvert-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for magicconvert-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f6d826e272c1a29accab62df60875ccb6262007d1c49b280fa2d8bd6224c152c
MD5 727e751ce43a8331af36c4b0be4a854f
BLAKE2b-256 24b10fbc0c2bebff191b22560a148742aba81b058cd487b0b7d660c6c90a0461

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page