Universal document converter with 250+ format combinations. Multi-page PDF to image conversion with page selection. OCR-powered image to Markdown for high-accuracy text extraction. Enhanced image support with PNG/JPEG/JPG/HEIC cross-conversions and PDF↔image conversions. Features PowerPoint to Obsidian Markdown with image extraction, format preservation, and navigation links. Supports PDF, DOCX, PPTX, MD, TEX, CSV, XLSX, TXT, HEIC, JPG, PNG, HTML, RTF, ODT.
Project description
Docuvert
Docuvert is a command-line tool that supports converting documents from any format to any other format.
Installation
Option 1: Install from PyPI (Recommended)
pip install docuvert
After installation, the docuvert command will be globally available in your PATH:
docuvert --version
docuvert input.pdf output.docx
Option 2: Development Setup
-
Clone the repository:
git clone https://github.com/your-repo/docuvert.git cd docuvert
-
Install in development mode:
pip install -e .
Or use the setup script for local development:
./setup.sh
Usage
Docuvert converts files based on their extensions. The syntax is simple:
docuvert <input_file_path> <output_file_path>
Basic Commands:
# Convert files
docuvert input.pdf output.docx
# Check version
docuvert --version
# Show detailed info (formats, examples, installation)
docuvert --info
# Show help
docuvert --help
Examples:
-
Convert PDF to DOCX:
docuvert document.pdf document.docx
-
Convert Markdown to PDF:
docuvert notes.md notes.pdf
-
Convert PowerPoint to Obsidian Markdown (NEW!):
docuvert presentation.pptx notes.md
-
Convert Legacy PowerPoint with automatic conversion:
docuvert lecture.ppt lecture.md
-
Convert DOCX to Markdown:
docuvert report.docx report.md
Supported Conversions
Docuvert supports 200+ format combinations with intelligent conversion routing. Key features include:
🎯 PowerPoint Conversions (NEW!)
- PPTX/PPT to Obsidian Markdown (
pptx2md,ppt2md) - Featured Converter- ✅ Automatic image extraction and embedding
- ✅ Format preservation (bold, italic, colors)
- ✅ Obsidian-specific features (YAML frontmatter, internal links, callouts)
- ✅ Slide navigation with Previous/Next links
- ✅ Table of contents generation
- ✅ Legacy .ppt support via LibreOffice conversion
- PPTX to PDF (
pptx2pdf) - PPTX to HTML (
pptx2html) - PPTX to Plain Text (
pptx2txt) - Markdown to PPTX (
md2pptx)
📄 Document Conversions
- PDF to DOCX (
pdf2docx) - PDF to Markdown (
pdf2md) - PDF to LaTeX (
pdf2tex) - PDF to Plain Text (
pdf2txt) - PDF to CSV (
pdf2csv) - PDF to XLSX (
pdf2xlsx) - DOCX to PDF (
docx2pdf) - DOCX to Markdown (
docx2md) - DOCX to LaTeX (
docx2tex) - DOCX to Plain Text (
docx2txt) - DOCX to CSV (
docx2csv) - DOCX to XLSX (
docx2xlsx) - Markdown to PDF (
md2pdf) - Markdown to DOCX (
md2docx) - Markdown to LaTeX (
md2tex) - Markdown to Plain Text (
md2txt) - Markdown to CSV (
md2csv) - Markdown to XLSX (
md2xlsx) - LaTeX to PDF (
tex2pdf) - LaTeX to DOCX (
tex2docx) - LaTeX to Markdown (
tex2md) - LaTeX to Plain Text (
tex2txt) - LaTeX to CSV (
tex2csv) - LaTeX to XLSX (
tex2xlsx) - Plain Text to PDF (
txt2pdf) - Plain Text to DOCX (
txt2docx) - Plain Text to Markdown (
txt2md) - Plain Text to LaTeX (
txt2tex) - Plain Text to CSV (
txt2csv) - Plain Text to XLSX (
txt2xlsx) - CSV to PDF (
csv2pdf) - CSV to DOCX (
csv2docx) - CSV to Markdown (
csv2md) - CSV to LaTeX (
csv2tex) - CSV to Plain Text (
csv2txt) - CSV to XLSX (
csv2xlsx) - XLSX to PDF (
xlsx2pdf) - XLSX to DOCX (
xlsx2docx) - XLSX to Markdown (
xlsx2md) - XLSX to LaTeX (
xlsx2tex) - XLSX to Plain Text (
xlsx2txt) - XLSX to CSV (
xlsx2csv)
🔄 Legacy Format Support
Docuvert automatically handles legacy Microsoft Office formats:
📝 Legacy Word (.doc) Support
- Automatic conversion:
.docfiles are automatically converted to.docxformat before processing - All format combinations supported: Use any
.doctoformatconversion just like.docx - Examples:
docuvert old-document.doc new-document.pdf docuvert report.doc report.md docuvert legacy.doc modern.docx
📊 Legacy Excel (.xls) Support
- Automatic conversion:
.xlsfiles are automatically converted to.xlsxformat before processing - All format combinations supported: Use any
.xlstoformatconversion just like.xlsx - Examples:
docuvert old-spreadsheet.xls new-spreadsheet.pdf docuvert data.xls data.csv docuvert legacy.xls modern.xlsx
📋 Requirements for Legacy Formats
- LibreOffice: Recommended for best conversion quality
- Install: https://www.libreoffice.org/download/
- Supports both
.docand.xlsformats
- Pandoc: Alternative for
.docconversion- Install: https://pandoc.org/installing.html
- xlrd: Python library for
.xlsreading (automatically installed)
🔧 Conversion Process
- Docuvert detects legacy format (
.docor.xls) - Creates temporary modern format file (
.docxor.xlsx) - Processes conversion using existing converters
- Cleans up temporary files automatically
- Returns final converted output
No additional configuration needed - just use legacy files like modern formats!
Contributing
See instructions.md for details on project organization and how to add new converters.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docuvert-1.3.4.tar.gz.
File metadata
- Download URL: docuvert-1.3.4.tar.gz
- Upload date:
- Size: 122.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56e10030bb79d1ed90d667a0a5a732d2ad0091be68ad11a39eee04be4221249b
|
|
| MD5 |
4a6003beece14abb84c4f9e0f6aeae02
|
|
| BLAKE2b-256 |
445a80241a462eb9f30b1531b1e83c5812753ba4507834add315777aa61889e2
|
File details
Details for the file docuvert-1.3.4-py3-none-any.whl.
File metadata
- Download URL: docuvert-1.3.4-py3-none-any.whl
- Upload date:
- Size: 169.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0e2f5accc1f0e6601cb8a622f54b92ed5aacc3f0a26b4e992edb337b6459268
|
|
| MD5 |
3fe6aa48ab433d822941bc6ebb0caeaa
|
|
| BLAKE2b-256 |
5acf5703c70395733b133fd0b5762d390a6619960389d79e07c37aa405c80bfb
|