Lightweight universal text/ebook/document format converter with CLI and API
Project description
ConverText
Lightweight universal text/document/ebook converter Self-contained Python based CLI tool with native format parsers.
Convert between all major text, document and ebook extensions with a single terminal command or through a Python API. Get editable .txt, .md or HTML from PDF or ebook formats or make ebooks/PDFs/HTML/etc. from text documents. Batch convert multiple files and send them anywhere in the file system or to your ereader automatically. Script converting whole folder structures with different settings per folder.
Supported Formats
Bidirectional (Read & Write): PDF, DOCX, RTF, TXT, Markdown, HTML, EPUB, MOBI, FB2
Read Only: DOC, ODT, AZW, AZW3
Features
- 🚀 Fast & Lightweight - Self-contained Python package (~25MB)
- 📝 Formatting Preservation - Maintains bold, italic, tables, lists, colors across formats
- ⚙️ Highly Configurable - YAML config with priority merging
- 🎯 Simple and Scriptable CLI & API - Intuitive command-line interface and built-in Python functions
- 🔍 Metadata Preservation - Keeps author, title, and document properties
Installation
pip install convertext
Quick Start
Command Line
# Convert PDF to EPUB
convertext book.pdf --format epub
# Convert Markdown to HTML and EPUB
convertext document.md --format html,epub
# Batch convert all Word docs to Markdown
convertext *.docx --format md
# Convert PDF to MOBI
convertext book.pdf --format mobi
# See all supported formats
convertext --list-formats
Python / Jupyter
import convertext
# Simple conversion
convertext.convert('book.pdf', 'epub')
# With options
convertext.convert('document.md', 'html', output='./out/', overwrite=True)
# Keep intermediate files (for debugging multi-hop)
convertext.convert('book.pdf', 'mobi', keep_intermediate=True)
Usage Examples
Single File Conversion
# PDF to text
convertext document.pdf --format txt
# Markdown to HTML or PDF
convertext README.md --format html
convertext README.md --format pdf
# DOCX to Markdown
convertext report.docx --format md
# Any format to PDF
convertext story.txt --format pdf
convertext article.html --format pdf
convertext notes.md --format pdf
# Create Word documents from any format
convertext article.md --format docx
convertext notes.txt --format docx
# Text to EPUB (creates an ebook)
convertext story.txt --format epub
Multiple Output Formats
# Convert to multiple formats at once
convertext book.md --format html,epub,txt
# Output to specific directory
convertext document.pdf --format txt --output ~/Documents/converted/
Batch Conversion
# Convert all Markdown files to HTML
convertext *.md --format html
# Convert multiple specific files
convertext chapter1.md chapter2.md chapter3.md --format epub
# Use with find for recursive conversion
find . -name "*.pdf" -exec convertext {} --format txt \;
Advanced Options
# Overwrite existing files
convertext document.pdf --format txt --overwrite
# Verbose output with progress
convertext *.md --format html --verbose
# Use custom config file
convertext book.md --format epub --config my-config.yaml
Working with Ebooks
# Create EPUB from Markdown (with chapters)
convertext book.md --format epub
# Convert EPUB to Kindle format
convertext ebook.epub --format mobi
# Convert any document to multiple ebook formats
convertext document.pdf --format epub,mobi,fb2 --verbose
# Convert EPUB to text for reading
convertext ebook.epub --format txt
# Extract EPUB to HTML
convertext ebook.epub --format html
Multi-Hop Conversion
ConverText automatically finds conversion paths for unsupported direct conversions:
# PDF → EPUB: Automatically converts via PDF → TXT → EPUB (2 hops)
convertext book.pdf --format epub --verbose
# Output: ✓ book.pdf → book.epub (PDF → TXT → EPUB, 2 hops)
# PDF → MOBI: Automatically converts via PDF → TXT → MOBI (2 hops)
convertext book.pdf --format mobi --verbose
# Output: ✓ book.pdf → book.mobi (PDF → TXT → MOBI, 2 hops)
# Keep intermediate files for debugging
convertext book.pdf --format epub --keep-intermediate
# Creates: book_intermediate.txt, book.epub
How it works: Uses BFS pathfinding to find the shortest conversion chain (max 3 hops). Intermediate files are automatically cleaned up unless --keep-intermediate is specified.
Format Matrix
Run convertext --list-formats to see all direct conversions. Multi-hop enables any-to-any conversion between compatible formats.
Configuration
ConverText supports flexible configuration through YAML files. You can set global defaults or create directory-specific configurations that automatically apply when converting files from those locations.
How Configuration Works
When you convert a file, ConverText searches for configuration in this order (highest priority first):
- CLI arguments - Flags you pass directly (e.g.,
--output ~/Books/) - Directory config -
convertext.yamlin the file's directory or any parent directory - User config -
~/.convertext/config.yaml(your global defaults) - Built-in defaults - Sensible defaults built into ConverText
Directory-Based Configuration
Place a convertext.yaml file in any directory to configure conversions for files in that directory and its subdirectories. The configuration is automatically discovered - ConverText searches from the file's location up through parent directories.
Example directory structure:
~/Documents/books/
├── convertext.yaml # Config for all books
├── fiction/
│ ├── convertext.yaml # Override for fiction
│ └── novel.pdf
└── technical/
└── manual.pdf # Uses ~/Documents/books/convertext.yaml
When converting fiction/novel.pdf, ConverText uses fiction/convertext.yaml.
When converting technical/manual.pdf, ConverText uses books/convertext.yaml (inherited).
Creating Configuration Files
Initialize global config:
convertext --init-config
Create directory config:
# Copy example file
cp convertext.yaml.example convertext.yaml
# Or create from scratch
cat > convertext.yaml << EOF
output:
directory: ~/Documents/converted
overwrite: false
documents:
encoding: utf-8
EOF
Configuration Example
See convertext.yaml.example for all available options. Here's a common configuration:
# Output settings
output:
directory: ~/Documents/converted
filename_pattern: "{name}.{ext}"
overwrite: false
# Document settings
documents:
encoding: utf-8
Key Configuration Options
| Section | Key | Default | Description |
|---|---|---|---|
output.directory |
null |
Output directory (null = source dir) | |
output.filename_pattern |
{name}.{ext} |
Output filename pattern | |
output.overwrite |
false |
Overwrite existing files | |
documents.encoding |
utf-8 |
Text file encoding |
CLI Reference
Usage: convertext [OPTIONS] [FILES]...
ConverText - Lightweight universal text converter.
Options:
-f, --format TEXT Output format(s), comma-separated
-o, --output PATH Output directory
-c, --config PATH Custom config file
--overwrite Overwrite existing files
--list-formats List all supported formats
--init-config Initialize user config file
--version Show version
-v, --verbose Verbose output (shows conversion hops)
--keep-intermediate Keep intermediate files in multi-hop conversions
--help Show help message
Use Cases
1. Documentation Workflow
# Write docs in Markdown, publish as HTML and PDF
convertext docs/*.md --format html
convertext docs/*.md --format pdf
# Generate EPUB documentation
convertext manual.md --format epub
2. Ebook Management
# Convert ebooks to text for reading on e-readers
convertext library/*.epub --format txt --output ~/ereader/
# Create EPUB from your writing
convertext novel.md --format epub
3. Archive Conversion
# Convert old Word documents to Markdown for version control
convertext archive/*.docx --format md --output ./converted/
# Extract text from PDFs
convertext reports/*.pdf --format txt
4. Blog Publishing
# Convert Markdown posts to HTML
convertext posts/*.md --format html --output ./public/
# Create downloadable EPUB versions
convertext posts/*.md --format epub --output ./public/downloads/
5. Research & Note-Taking
# Convert research PDFs to Markdown for notes
convertext papers/*.pdf --format md
# Create EPUB from notes for mobile reading
convertext notes/*.md --format epub
Architecture
ConverText uses an intermediate Document format for conversions:
Input Format → Document (internal) → Output Format
This allows any-to-any conversions without N² converter implementations.
Key Components
- BaseConverter: Abstract base for all format converters
- Document: Intermediate representation (metadata, content blocks, images)
- ConverterRegistry: Routes source→target format conversions with BFS pathfinding
- ConversionEngine: Orchestrates conversions and multi-hop chaining
- Config: Manages configuration with priority merging
Native Implementations
ConverText implements lightweight native Python parsers for ebook formats:
-
EPUB: Native Python reader/writer using zipfile + lxml
- Reads: Parses OPF metadata and spine order
- Writes: Generates EPUB 3 structure (container.xml, OPF, NCX, XHTML)
-
MOBI: Native Python reader/writer using PalmDB format
- Reads: PalmDB parser with PalmDOC decompression
- Writes: PalmDB structure with optimized PalmDOC compression
-
ODT: Native Python reader using zipfile + lxml
-
FB2: Native Python reader/writer using lxml XML parser
Development
Setup
git clone https://github.com/danielcorsano/convertext.git
cd convertext
poetry install
Run Tests
pytest
pytest -v # Verbose
pytest --cov # With coverage
Code Quality
black . # Format code
ruff check convertext/ # Lint
mypy convertext/ # Type check
Manual Testing
convertext --help
convertext test.md --format html --verbose
Related Projects
Want to listen to your text files instead of reading them? Try audiobook-reader - converts text, ebooks, and documents into natural-sounding audiobooks.
💝 Support This Project
If you find this tool helpful, please consider sponsoring the project. I created and maintain this software alone as a public service, and donations help me improve it and develop requested features. If I get $99 of donations, I will use it to pay for the Apple developer program so I can make iOS versions of all my open source apps.
Your support makes a real difference in keeping this project active and growing. Thank you!
Support
License
MIT License - see LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file convertext-0.2.2.tar.gz.
File metadata
- Download URL: convertext-0.2.2.tar.gz
- Upload date:
- Size: 34.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7e57090c4a0b80223254ef3b4acb0ecf46066c838f39713f8e74c9d9dd6c023
|
|
| MD5 |
73c3ed7e94362f82d42f2474e604b9a9
|
|
| BLAKE2b-256 |
6965d280c8c81a032ba0b0e84ff01ae102952a8a8e5cdef618d6d508a00e6174
|
File details
Details for the file convertext-0.2.2-py3-none-any.whl.
File metadata
- Download URL: convertext-0.2.2-py3-none-any.whl
- Upload date:
- Size: 52.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
daf3b29866d0a623496651e9ab25e6a9c57f88c94143282097b44f45e0c31b62
|
|
| MD5 |
d92e4395b3f77d8f4e3282134ee2bbc4
|
|
| BLAKE2b-256 |
57f3136adca293cfb82cbc03656733462672c4fe0500fc4fd09d216eb386eb48
|