A lightweight toolbox to manipulate documents
Project description
Install
Install Dependencies
linux/osx
apt-get/yum/brew install libreoffice
windows
install libreoffice
append "install_dir\LibreOffice\program" to ENVIRONMENT PATH
Install Magic-Doc
git clone https://github.com/magicpdf/Magic-Doc (#TODO)
cd Magic-Doc
pip install -r requirements.txt
python setup.py install
Introduction
Magic-Doc is a lightweight open-source tool that allows users to convert mulitple file type (PPT/PPTX/DOC/DOCX/PDF) to markdown. It supports both local file and S3 file.
Example
from magic_doc.docconv import DocConverter, S3Config
s3_config = S3Config(ak='${ak}', sk='${sk}', endpoint='${endpoint}')
converter = DocConverter(s3_config=s3_config)
markdown_cotent, time_cost = converter("some_doc.pptx", "/tmp/convert_progress.txt", conv_timeout=300)
Performance
File Type | Speed |
---|---|
PDF (digital) | 347 (page/s) |
PDF (OCR) | 2.7 (page/s) |
PPT | 20 (page/s) |
PPTX | 149 (page/s) |
DOC | 600 (page/s) |
DOCX | 1482 (page/s) |
All Thanks To Our Contributors:
License
This project is released under the Apache 2.0 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fairy_doc-0.1.1.tar.gz
(687.6 kB
view hashes)
Built Distribution
fairy_doc-0.1.1-py3-none-any.whl
(798.1 kB
view hashes)
Close
Hashes for fairy_doc-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8d6ceb64f25817721d074f7c03564287d265e50afc178a56698cb67fed7f068 |
|
MD5 | dc9b63c3b88d4e663f6fb96b6e36e3f5 |
|
BLAKE2b-256 | 35b01d3a6fb286eb212acd22205b5e7bfeff567d15163141cc2a5103e995d2fb |