Skip to main content

Lightweight universal text/ebook/document format converter with CLI and API

Project description

ConverText

PyPI version Python 3.9+ Downloads

Lightweight universal text/document/ebook converter Self-contained Python based CLI tool with native format parsers.

Convert between all major text, document and ebook extensions with a single terminal command or through a Python API. Get editable .txt, .md or HTML from PDF or ebook formats or make ebooks/PDFs/HTML/etc. from text documents. Batch convert multiple files and send them anywhere in the file system or to your ereader automatically. Script converting whole folder structures with different settings per folder.

Supported Formats

Bidirectional (Read & Write): PDF, DOCX, RTF, TXT, Markdown, HTML, EPUB, MOBI, FB2

Read Only: DOC, ODT, AZW, AZW3

Features

  • 🚀 Fast & Lightweight - Self-contained Python package (~25MB)
  • 📝 Formatting Preservation - Maintains bold, italic, tables, lists, colors across formats
  • ⚙️ Highly Configurable - YAML config with priority merging
  • 🎯 Simple and Scriptable CLI & API - Intuitive command-line interface and built-in Python functions
  • 🔍 Metadata Preservation - Keeps author, title, and document properties

Installation

pip install convertext

Quick Start

Command Line

# Convert PDF to EPUB
convertext book.pdf --format epub

# Convert Markdown to HTML and EPUB
convertext document.md --format html,epub

# Batch convert all Word docs to Markdown
convertext *.docx --format md

# Convert PDF to MOBI
convertext book.pdf --format mobi

# See all supported formats
convertext --list-formats

Python / Jupyter

import convertext

# Simple conversion
convertext.convert('book.pdf', 'epub')

# With options
convertext.convert('document.md', 'html', output='./out/', overwrite=True)

# Keep intermediate files (for debugging multi-hop)
convertext.convert('book.pdf', 'mobi', keep_intermediate=True)

Usage Examples

Single File Conversion

# PDF to text
convertext document.pdf --format txt

# Markdown to HTML or PDF
convertext README.md --format html
convertext README.md --format pdf

# DOCX to Markdown
convertext report.docx --format md

# Any format to PDF
convertext story.txt --format pdf
convertext article.html --format pdf
convertext notes.md --format pdf

# Create Word documents from any format
convertext article.md --format docx
convertext notes.txt --format docx

# Text to EPUB (creates an ebook)
convertext story.txt --format epub

Multiple Output Formats

# Convert to multiple formats at once
convertext book.md --format html,epub,txt

# Output to specific directory
convertext document.pdf --format txt --output ~/Documents/converted/

Batch Conversion

# Convert all Markdown files to HTML
convertext *.md --format html

# Convert multiple specific files
convertext chapter1.md chapter2.md chapter3.md --format epub

# Use with find for recursive conversion
find . -name "*.pdf" -exec convertext {} --format txt \;

Advanced Options

# Overwrite existing files
convertext document.pdf --format txt --overwrite

# Verbose output with progress
convertext *.md --format html --verbose

# Use custom config file
convertext book.md --format epub --config my-config.yaml

Working with Ebooks

# Create EPUB from Markdown (with chapters)
convertext book.md --format epub

# Convert EPUB to Kindle format
convertext ebook.epub --format mobi

# Convert any document to multiple ebook formats
convertext document.pdf --format epub,mobi,fb2 --verbose

# Convert EPUB to text for reading
convertext ebook.epub --format txt

# Extract EPUB to HTML
convertext ebook.epub --format html

Multi-Hop Conversion

ConverText automatically finds conversion paths for unsupported direct conversions:

# PDF → EPUB: Automatically converts via PDF → TXT → EPUB (2 hops)
convertext book.pdf --format epub --verbose
# Output: ✓ book.pdf → book.epub (PDF → TXT → EPUB, 2 hops)

# PDF → MOBI: Automatically converts via PDF → TXT → MOBI (2 hops)
convertext book.pdf --format mobi --verbose
# Output: ✓ book.pdf → book.mobi (PDF → TXT → MOBI, 2 hops)

# Keep intermediate files for debugging
convertext book.pdf --format epub --keep-intermediate
# Creates: book_intermediate.txt, book.epub

How it works: Uses BFS pathfinding to find the shortest conversion chain (max 3 hops). Intermediate files are automatically cleaned up unless --keep-intermediate is specified.

Format Matrix

Run convertext --list-formats to see all direct conversions. Multi-hop enables any-to-any conversion between compatible formats.

Configuration

ConverText supports flexible configuration through YAML files. You can set global defaults or create directory-specific configurations that automatically apply when converting files from those locations.

How Configuration Works

When you convert a file, ConverText searches for configuration in this order (highest priority first):

  1. CLI arguments - Flags you pass directly (e.g., --output ~/Books/)
  2. Directory config - convertext.yaml in the file's directory or any parent directory
  3. User config - ~/.convertext/config.yaml (your global defaults)
  4. Built-in defaults - Sensible defaults built into ConverText

Directory-Based Configuration

Place a convertext.yaml file in any directory to configure conversions for files in that directory and its subdirectories. The configuration is automatically discovered - ConverText searches from the file's location up through parent directories.

Example directory structure:

~/Documents/books/
├── convertext.yaml          # Config for all books
├── fiction/
│   ├── convertext.yaml      # Override for fiction
│   └── novel.pdf
└── technical/
    └── manual.pdf           # Uses ~/Documents/books/convertext.yaml

When converting fiction/novel.pdf, ConverText uses fiction/convertext.yaml. When converting technical/manual.pdf, ConverText uses books/convertext.yaml (inherited).

Creating Configuration Files

Initialize global config:

convertext --init-config

Create directory config:

# Copy example file
cp convertext.yaml.example convertext.yaml

# Or create from scratch
cat > convertext.yaml << EOF
output:
  directory: ~/Documents/converted
  overwrite: false
documents:
  encoding: utf-8
EOF

Configuration Example

See convertext.yaml.example for all available options. Here's a common configuration:

# Output settings
output:
  directory: ~/Documents/converted
  filename_pattern: "{name}.{ext}"
  overwrite: false

# Document settings
documents:
  encoding: utf-8

Key Configuration Options

Section Key Default Description
output.directory null Output directory (null = source dir)
output.filename_pattern {name}.{ext} Output filename pattern
output.overwrite false Overwrite existing files
documents.encoding utf-8 Text file encoding

CLI Reference

Usage: convertext [OPTIONS] [FILES]...

  ConverText - Lightweight universal text converter.

Options:
  -f, --format TEXT            Output format(s), comma-separated
  -o, --output PATH            Output directory
  -c, --config PATH            Custom config file
  --overwrite                  Overwrite existing files
  --list-formats               List all supported formats
  --init-config                Initialize user config file
  --version                    Show version
  -v, --verbose                Verbose output (shows conversion hops)
  --keep-intermediate          Keep intermediate files in multi-hop conversions
  --help                       Show help message

Use Cases

1. Documentation Workflow

# Write docs in Markdown, publish as HTML and PDF
convertext docs/*.md --format html
convertext docs/*.md --format pdf

# Generate EPUB documentation
convertext manual.md --format epub

2. Ebook Management

# Convert ebooks to text for reading on e-readers
convertext library/*.epub --format txt --output ~/ereader/

# Create EPUB from your writing
convertext novel.md --format epub

3. Archive Conversion

# Convert old Word documents to Markdown for version control
convertext archive/*.docx --format md --output ./converted/

# Extract text from PDFs
convertext reports/*.pdf --format txt

4. Blog Publishing

# Convert Markdown posts to HTML
convertext posts/*.md --format html --output ./public/

# Create downloadable EPUB versions
convertext posts/*.md --format epub --output ./public/downloads/

5. Research & Note-Taking

# Convert research PDFs to Markdown for notes
convertext papers/*.pdf --format md

# Create EPUB from notes for mobile reading
convertext notes/*.md --format epub

Architecture

ConverText uses an intermediate Document format for conversions:

Input Format → Document (internal) → Output Format

This allows any-to-any conversions without N² converter implementations.

Key Components

  • BaseConverter: Abstract base for all format converters
  • Document: Intermediate representation (metadata, content blocks, images)
  • ConverterRegistry: Routes source→target format conversions with BFS pathfinding
  • ConversionEngine: Orchestrates conversions and multi-hop chaining
  • Config: Manages configuration with priority merging

Native Implementations

ConverText implements lightweight native Python parsers for ebook formats:

  • EPUB: Native Python reader/writer using zipfile + lxml

    • Reads: Parses OPF metadata and spine order
    • Writes: Generates EPUB 3 structure (container.xml, OPF, NCX, XHTML)
  • MOBI: Native Python reader/writer using PalmDB format

    • Reads: PalmDB parser with PalmDOC decompression
    • Writes: PalmDB structure with optimized PalmDOC compression
  • ODT: Native Python reader using zipfile + lxml

  • FB2: Native Python reader/writer using lxml XML parser

Development

Setup

git clone https://github.com/danielcorsano/convertext.git
cd convertext
poetry install

Run Tests

pytest
pytest -v                    # Verbose
pytest --cov                 # With coverage

Code Quality

black .                      # Format code
ruff check convertext/       # Lint
mypy convertext/             # Type check

Manual Testing

convertext --help
convertext test.md --format html --verbose

Related Projects

Want to listen to your text files instead of reading them? Try audiobook-reader - converts text, ebooks, and documents into natural-sounding audiobooks.

💝 Support This Project

If you find this tool helpful, please consider sponsoring the project. I created and maintain this software alone as a public service, and donations help me improve it and develop requested features. If I get $99 of donations, I will use it to pay for the Apple developer program so I can make iOS versions of all my open source apps.

Your support makes a real difference in keeping this project active and growing. Thank you!

Support

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convertext-0.2.2.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

convertext-0.2.2-py3-none-any.whl (52.4 kB view details)

Uploaded Python 3

File details

Details for the file convertext-0.2.2.tar.gz.

File metadata

  • Download URL: convertext-0.2.2.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/25.2.0

File hashes

Hashes for convertext-0.2.2.tar.gz
Algorithm Hash digest
SHA256 c7e57090c4a0b80223254ef3b4acb0ecf46066c838f39713f8e74c9d9dd6c023
MD5 73c3ed7e94362f82d42f2474e604b9a9
BLAKE2b-256 6965d280c8c81a032ba0b0e84ff01ae102952a8a8e5cdef618d6d508a00e6174

See more details on using hashes here.

File details

Details for the file convertext-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: convertext-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 52.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/25.2.0

File hashes

Hashes for convertext-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 daf3b29866d0a623496651e9ab25e6a9c57f88c94143282097b44f45e0c31b62
MD5 d92e4395b3f77d8f4e3282134ee2bbc4
BLAKE2b-256 57f3136adca293cfb82cbc03656733462672c4fe0500fc4fd09d216eb386eb48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page