Skip to main content

Lightweight universal text/ebook/document format converter with CLI and API

Project description

ConverText

PyPI version Python 3.9+ Downloads

Lightweight universal text/document/ebook converter Self-contained Python based CLI tool with native format parsers.

Convert between all major text, document and ebook extensions with a single terminal command or through a Python API. Get editable .txt, .md or HTML from PDF or ebook formats or make ebooks/PDFs/HTML/etc. from text documents. Batch convert multiple files and send them anywhere in the file system or to your ereader automatically. Script converting whole folder structures with different settings per folder.

Supported Formats

Bidirectional (Read & Write): PDF, DOCX, RTF, TXT, Markdown, HTML, EPUB, AZW3, FB2

Read Only: DOC, ODT, AZW

Features

  • 🚀 Fast & Lightweight - Self-contained Python package (~25MB)
  • 📝 Formatting Preservation - Maintains bold, italic, tables, lists, colors across formats
  • ⚙️ Highly Configurable - YAML config with priority merging
  • 🎯 Simple and Scriptable CLI & API - Intuitive command-line interface and built-in Python functions
  • 🔍 Metadata Preservation - Keeps author, title, and document properties

Installation

pip install convertext

Quick Start

Command Line

# Convert PDF to EPUB
convertext book.pdf --format epub

# Convert Markdown to HTML and EPUB
convertext document.md --format html,epub

# Batch convert all Word docs to Markdown
convertext *.docx --format md

# Convert PDF to AZW3 (Kindle)
convertext book.pdf --format azw3

# See all supported formats
convertext --list-formats

Python / Jupyter

import convertext

# Simple conversion
convertext.convert('book.pdf', 'epub')

# With options
convertext.convert('document.md', 'html', output='./out/', overwrite=True)

# Keep intermediate files (for debugging multi-hop)
convertext.convert('book.pdf', 'azw3', keep_intermediate=True)

Usage Examples

Single File Conversion

# PDF to text
convertext document.pdf --format txt

# Markdown to HTML or PDF
convertext README.md --format html
convertext README.md --format pdf

# DOCX to Markdown
convertext report.docx --format md

# Any format to PDF
convertext story.txt --format pdf
convertext article.html --format pdf
convertext notes.md --format pdf

# Create Word documents from any format
convertext article.md --format docx
convertext notes.txt --format docx

# Text to EPUB (creates an ebook)
convertext story.txt --format epub

Multiple Output Formats

# Convert to multiple formats at once
convertext book.md --format html,epub,txt

# Output to specific directory
convertext document.pdf --format txt --output ~/Documents/converted/

Batch Conversion

# Convert all Markdown files to HTML
convertext *.md --format html

# Convert multiple specific files
convertext chapter1.md chapter2.md chapter3.md --format epub

# Use with find for recursive conversion
find . -name "*.pdf" -exec convertext {} --format txt \;

Advanced Options

# Overwrite existing files
convertext document.pdf --format txt --overwrite

# Verbose output with progress
convertext *.md --format html --verbose

# Use custom config file
convertext book.md --format epub --config my-config.yaml

Working with Ebooks

# Create EPUB from Markdown (with chapters)
convertext book.md --format epub

# Convert EPUB to Kindle format
convertext ebook.epub --format azw3

# Convert any document to multiple ebook formats
convertext document.pdf --format epub,azw3,fb2 --verbose

# Convert EPUB to text for reading
convertext ebook.epub --format txt

# Extract EPUB to HTML
convertext ebook.epub --format html

Multi-Hop Conversion

ConverText automatically finds conversion paths for unsupported direct conversions:

# PDF → EPUB: Automatically converts via PDF → TXT → EPUB (2 hops)
convertext book.pdf --format epub --verbose
# Output: ✓ book.pdf → book.epub (PDF → TXT → EPUB, 2 hops)

# PDF → AZW3: Automatically converts via PDF → TXT → AZW3 (2 hops)
convertext book.pdf --format azw3 --verbose
# Output: ✓ book.pdf → book.azw3 (PDF → TXT → AZW3, 2 hops)

# Keep intermediate files for debugging
convertext book.pdf --format epub --keep-intermediate
# Creates: book_intermediate.txt, book.epub

How it works: Uses BFS pathfinding to find the shortest conversion chain (max 3 hops). Intermediate files are automatically cleaned up unless --keep-intermediate is specified.

Format Matrix

Run convertext --list-formats to see all direct conversions. Multi-hop enables any-to-any conversion between compatible formats.

Configuration

ConverText supports flexible configuration through YAML files. You can set global defaults or create directory-specific configurations that automatically apply when converting files from those locations.

How Configuration Works

When you convert a file, ConverText searches for configuration in this order (highest priority first):

  1. CLI arguments - Flags you pass directly (e.g., --output ~/Books/)
  2. Directory config - convertext.yaml in the file's directory or any parent directory
  3. User config - ~/.convertext/config.yaml (your global defaults)
  4. Built-in defaults - Sensible defaults built into ConverText

Directory-Based Configuration

Place a convertext.yaml file in any directory to configure conversions for files in that directory and its subdirectories. The configuration is automatically discovered - ConverText searches from the file's location up through parent directories.

Example directory structure:

~/Documents/books/
├── convertext.yaml          # Config for all books
├── fiction/
│   ├── convertext.yaml      # Override for fiction
│   └── novel.pdf
└── technical/
    └── manual.pdf           # Uses ~/Documents/books/convertext.yaml

When converting fiction/novel.pdf, ConverText uses fiction/convertext.yaml. When converting technical/manual.pdf, ConverText uses books/convertext.yaml (inherited).

Creating Configuration Files

Initialize global config:

convertext --init-config

Create directory config:

# Copy example file
cp convertext.yaml.example convertext.yaml

# Or create from scratch
cat > convertext.yaml << EOF
output:
  directory: ~/Documents/converted
  overwrite: false
documents:
  encoding: utf-8
EOF

Configuration Example

See convertext.yaml.example for all available options. Here's a common configuration:

# Output settings
output:
  directory: ~/Documents/converted
  filename_pattern: "{name}.{ext}"
  overwrite: false

# Document settings
documents:
  encoding: utf-8

Key Configuration Options

Section Key Default Description
output.directory null Output directory (null = source dir)
output.filename_pattern {name}.{ext} Output filename pattern
output.overwrite false Overwrite existing files
documents.encoding utf-8 Text file encoding
documents.title_from_filename false Use filename as document title

CLI Reference

Usage: convertext [OPTIONS] [FILES]...

  ConverText - Lightweight universal text converter.

Options:
  -f, --format TEXT            Output format(s), comma-separated
  -o, --output PATH            Output directory
  -c, --config PATH            Custom config file
  --overwrite                  Overwrite existing files
  --list-formats               List all supported formats
  --init-config                Initialize user config file
  --version                    Show version
  -v, --verbose                Verbose output (shows conversion hops)
  --keep-intermediate          Keep intermediate files in multi-hop conversions
  --help                       Show help message

Use Cases

1. Documentation Workflow

# Write docs in Markdown, publish as HTML and PDF
convertext docs/*.md --format html
convertext docs/*.md --format pdf

# Generate EPUB documentation
convertext manual.md --format epub

2. Ebook Management

# Convert ebooks to text for reading on e-readers
convertext library/*.epub --format txt --output ~/ereader/

# Create EPUB from your writing
convertext novel.md --format epub

3. Archive Conversion

# Convert old Word documents to Markdown for version control
convertext archive/*.docx --format md --output ./converted/

# Extract text from PDFs
convertext reports/*.pdf --format txt

4. Blog Publishing

# Convert Markdown posts to HTML
convertext posts/*.md --format html --output ./public/

# Create downloadable EPUB versions
convertext posts/*.md --format epub --output ./public/downloads/

5. Research & Note-Taking

# Convert research PDFs to Markdown for notes
convertext papers/*.pdf --format md

# Create EPUB from notes for mobile reading
convertext notes/*.md --format epub

Architecture

ConverText uses an intermediate Document format for conversions:

Input Format → Document (internal) → Output Format

This allows any-to-any conversions without N² converter implementations.

Key Components

  • BaseConverter: Abstract base for all format converters
  • Document: Intermediate representation (metadata, content blocks, images)
  • ConverterRegistry: Routes source→target format conversions with BFS pathfinding
  • ConversionEngine: Orchestrates conversions and multi-hop chaining
  • Config: Manages configuration with priority merging

Native Implementations

ConverText implements lightweight native Python parsers for ebook formats:

  • EPUB: Native Python reader/writer using zipfile + lxml

    • Reads: Parses OPF metadata and spine order
    • Writes: Generates EPUB 3 structure (container.xml, OPF, NCX, XHTML)
  • AZW3/KF8: Native Python reader/writer using PDB container with MOBI v8 headers

    • Reads: PDB parser with PalmDOC decompression and EXTH metadata extraction
    • Writes: PDB structure with KF8 headers, FDST, and PalmDOC compression
  • ODT: Native Python reader using zipfile + lxml

  • FB2: Native Python reader/writer using lxml XML parser

Development

Setup

git clone https://github.com/danielcorsano/convertext.git
cd convertext
poetry install

Run Tests

pytest
pytest -v                    # Verbose
pytest --cov                 # With coverage

Code Quality

black .                      # Format code
ruff check convertext/       # Lint
mypy convertext/             # Type check

Manual Testing

convertext --help
convertext test.md --format html --verbose

Related Projects

Want to listen to your text files instead of reading them? Try audiobook-reader - converts text, ebooks, and documents into natural-sounding audiobooks.

💝 Support This Project

If you find this tool helpful, please consider sponsoring the project. I created and maintain this software alone as a public service, and donations help me improve it and develop requested features. If I get $99 of donations, I will use it to pay for the Apple developer program so I can make iOS versions of all my open source apps.

Your support makes a real difference in keeping this project active and growing. Thank you!

Support

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convertext-0.3.0.tar.gz (42.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

convertext-0.3.0-py3-none-any.whl (60.3 kB view details)

Uploaded Python 3

File details

Details for the file convertext-0.3.0.tar.gz.

File metadata

  • Download URL: convertext-0.3.0.tar.gz
  • Upload date:
  • Size: 42.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/25.4.0

File hashes

Hashes for convertext-0.3.0.tar.gz
Algorithm Hash digest
SHA256 34dd0e90600523967a0a8fda61a21b8d56f407a8f2c923ef96b5fddbe4e8b570
MD5 2d8995ed98f15db757ae1ed717d417bc
BLAKE2b-256 e1ffc3f4c2a1bd0ddea5bd63275f5714d0e5788c8343e6c9fa8134b7aeee7f9c

See more details on using hashes here.

File details

Details for the file convertext-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: convertext-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 60.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/25.4.0

File hashes

Hashes for convertext-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 426152c8ea8ed8e1ca5b3021e64f3e217a7a6982ac4d70716d97b174328b5724
MD5 03cd841d7394216b87a8a20b46da11fb
BLAKE2b-256 742ea1a2af68993f2a1756836911c1622351a5b9dadd0e74514a626c1fcd5de1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page