A lightweight toolbox to manipulate documents
Project description
Install
Install Dependencies
linux/osx
apt-get/yum/brew install libreoffice
windows
install libreoffice
append "install_dir\LibreOffice\program" to ENVIRONMENT PATH
Install Magic-Doc
git clone https://github.com/magicpdf/Magic-Doc (#TODO)
cd Magic-Doc
pip install -r requirements.txt
python setup.py install
Introduction
Magic-Doc is a lightweight open-source tool that allows users to convert mulitple file type (PPT/PPTX/DOC/DOCX/PDF) to markdown. It supports both local file and S3 file.
Example
from magic_doc.docconv import DocConverter, S3Config
s3_config = S3Config(ak='${ak}', sk='${sk}', endpoint='${endpoint}')
converter = DocConverter(s3_config=s3_config)
markdown_cotent, time_cost = converter("some_doc.pptx", "/tmp/convert_progress.txt", conv_timeout=300)
Performance
File Type | Speed |
---|---|
PDF (digital) | 347 (page/s) |
PDF (OCR) | 2.7 (page/s) |
PPT | 20 (page/s) |
PPTX | 149 (page/s) |
DOC | 600 (page/s) |
DOCX | 1482 (page/s) |
All Thanks To Our Contributors:
License
This project is released under the Apache 2.0 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fairy_doc-0.0.31.tar.gz
(687.6 kB
view hashes)
Built Distribution
fairy_doc-0.0.31-py3-none-any.whl
(798.1 kB
view hashes)
Close
Hashes for fairy_doc-0.0.31-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 05b08c460979389a0ad2324cdbaf89c5669d6421e97b23d10be739793cf6deb8 |
|
MD5 | 46d9668668881a313ae0634f12b4a4f6 |
|
BLAKE2b-256 | ab9b12357f97431530156c151d37a0fcc016ac8e4ff712c3f26ef8f727d904b0 |