Skip to main content

A lightweight toolbox to manipulate documents

Project description

license issue resolution open issues

👋 join us on Discord and WeChat

English | 简体中文

Install

Prerequisites: python3.10

Install Dependencies

linux/osx

apt-get/yum/brew install libreoffice

windows

install libreoffice 
append "install_dir\LibreOffice\program" to ENVIRONMENT PATH

Install Magic-Doc

pip install fairy-doc[cpu] # cpu version
or
pip install fairy-doc[gpu] # gpu version

Introduction

Magic-Doc is a lightweight open-source tool that allows users to convert multiple file type (PPT/PPTX/DOC/DOCX/PDF) to markdown. It supports both local file and S3 file.

Example

# for local file
from magic_doc.docconv import DocConverter, S3Config
converter = DocConverter(s3_config=None)
markdown_content, time_cost = converter.convert("some_doc.pptx", conv_timeout=300)
# for remote file located in aws s3
from magic_doc.docconv import DocConverter, S3Config

s3_config = S3Config(ak='${ak}', sk='${sk}', endpoint='${endpoint}')
converter = DocConverter(s3_config=s3_config)
markdown_content, time_cost = converter.convert("s3://some_bucket/some_doc.pptx", conv_timeout=300)

Performance

ENV: AMD EPYC 7742 64-Core Processor, NVIDIA A100, Centos 7

File Type Speed
PDF (digital) 347 (page/s)
PDF (ocr) 2.7 (page/s)
PPT 20 (page/s)
PPTX 149 (page/s)
DOC 600 (page/s)
DOCX 1482 (page/s)

All Thanks To Our Contributors:

image

Acknowledgments

🖊️ Citation

@misc{2024magic-doc,
    title={Magic-Doc: A Toolkit that Converts Multiple File Types to Markdown},
    author={Magic-Doc Contributors},
    howpublished = {\url{https://github.com/InternLM/magic-doc}},
    year={2024}
}

License

This project is released under the Apache 2.0 license.

🔼 Back to top

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

fairy_doc-0.1.44-py3-none-any.whl (797.8 kB view details)

Uploaded Python 3

File details

Details for the file fairy_doc-0.1.44-py3-none-any.whl.

File metadata

  • Download URL: fairy_doc-0.1.44-py3-none-any.whl
  • Upload date:
  • Size: 797.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for fairy_doc-0.1.44-py3-none-any.whl
Algorithm Hash digest
SHA256 a85fe33ccf69825919df03a203604a3157a6d2ac9f1ea445cc36b2befd8f2a81
MD5 81c363b7058874237ca2e389827024d6
BLAKE2b-256 e61008d7b7e3ad8ee69f3cd8fc5d90bd85dab6042debd8d6c7a3df7db61ec678

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page