Lightweight universal text/ebook/document format converter with CLI and API

These details have not been verified by PyPI

Project links

Project description

ConverText

Lightweight universal text/document/ebook converter Self-contained Python based CLI tool with native format parsers.

Convert between all major text, document and ebook extensions with a single terminal command or through a Python API. Get editable .txt, .md or HTML from PDF or ebook formats or make ebooks/PDFs/HTML/etc. from text documents. Batch convert multiple files and send them anywhere in the file system or to your ereader automatically. Script converting whole folder structures with different settings per folder.

Supported Formats

Bidirectional (Read & Write): PDF, DOCX, RTF, TXT, Markdown, HTML, EPUB, AZW3, FB2

Read Only: DOC, ODT, AZW

Features

🚀 Fast & Lightweight - Self-contained Python package (~25MB)
📝 Formatting Preservation - Maintains bold, italic, tables, lists, colors across formats
⚙️ Highly Configurable - YAML config with priority merging
🎯 Simple and Scriptable CLI & API - Intuitive command-line interface and built-in Python functions
🔍 Metadata Preservation - Keeps author, title, and document properties

Installation

pip install convertext

Quick Start

Command Line

# Convert PDF to EPUB
convertext book.pdf --format epub

# Convert Markdown to HTML and EPUB
convertext document.md --format html,epub

# Batch convert all Word docs to Markdown
convertext *.docx --format md

# Convert PDF to AZW3 (Kindle)
convertext book.pdf --format azw3

# See all supported formats
convertext --list-formats

Python / Jupyter

import convertext

# Simple conversion
convertext.convert('book.pdf', 'epub')

# With options
convertext.convert('document.md', 'html', output='./out/', overwrite=True)

# Keep intermediate files (for debugging multi-hop)
convertext.convert('book.pdf', 'azw3', keep_intermediate=True)

Usage Examples

Single File Conversion

# PDF to text
convertext document.pdf --format txt

# Markdown to HTML or PDF
convertext README.md --format html
convertext README.md --format pdf

# DOCX to Markdown
convertext report.docx --format md

# Any format to PDF
convertext story.txt --format pdf
convertext article.html --format pdf
convertext notes.md --format pdf

# Create Word documents from any format
convertext article.md --format docx
convertext notes.txt --format docx

# Text to EPUB (creates an ebook)
convertext story.txt --format epub

Multiple Output Formats

# Convert to multiple formats at once
convertext book.md --format html,epub,txt

# Output to specific directory
convertext document.pdf --format txt --output ~/Documents/converted/

Batch Conversion

# Convert all Markdown files to HTML
convertext *.md --format html

# Convert multiple specific files
convertext chapter1.md chapter2.md chapter3.md --format epub

# Use with find for recursive conversion
find . -name "*.pdf" -exec convertext {} --format txt \;

Advanced Options

# Overwrite existing files
convertext document.pdf --format txt --overwrite

# Verbose output with progress
convertext *.md --format html --verbose

# Use custom config file
convertext book.md --format epub --config my-config.yaml

Working with Ebooks

# Create EPUB from Markdown (with chapters)
convertext book.md --format epub

# Convert EPUB to Kindle format
convertext ebook.epub --format azw3

# Convert any document to multiple ebook formats
convertext document.pdf --format epub,azw3,fb2 --verbose

# Convert EPUB to text for reading
convertext ebook.epub --format txt

# Extract EPUB to HTML
convertext ebook.epub --format html

Multi-Hop Conversion

ConverText automatically finds conversion paths for unsupported direct conversions:

# PDF → EPUB: Automatically converts via PDF → TXT → EPUB (2 hops)
convertext book.pdf --format epub --verbose
# Output: ✓ book.pdf → book.epub (PDF → TXT → EPUB, 2 hops)

# PDF → AZW3: Automatically converts via PDF → TXT → AZW3 (2 hops)
convertext book.pdf --format azw3 --verbose
# Output: ✓ book.pdf → book.azw3 (PDF → TXT → AZW3, 2 hops)

# Keep intermediate files for debugging
convertext book.pdf --format epub --keep-intermediate
# Creates: book_intermediate.txt, book.epub

How it works: Uses BFS pathfinding to find the shortest conversion chain (max 3 hops). Intermediate files are automatically cleaned up unless --keep-intermediate is specified.

Format Matrix

Run convertext --list-formats to see all direct conversions. Multi-hop enables any-to-any conversion between compatible formats.

Configuration

ConverText supports flexible configuration through YAML files. You can set global defaults or create directory-specific configurations that automatically apply when converting files from those locations.

How Configuration Works

When you convert a file, ConverText searches for configuration in this order (highest priority first):

CLI arguments - Flags you pass directly (e.g., --output ~/Books/)
Directory config - convertext.yaml in the file's directory or any parent directory
User config - ~/.convertext/config.yaml (your global defaults)
Built-in defaults - Sensible defaults built into ConverText

Directory-Based Configuration

Place a convertext.yaml file in any directory to configure conversions for files in that directory and its subdirectories. The configuration is automatically discovered - ConverText searches from the file's location up through parent directories.

Example directory structure:

~/Documents/books/
├── convertext.yaml          # Config for all books
├── fiction/
│   ├── convertext.yaml      # Override for fiction
│   └── novel.pdf
└── technical/
    └── manual.pdf           # Uses ~/Documents/books/convertext.yaml

When converting fiction/novel.pdf, ConverText uses fiction/convertext.yaml. When converting technical/manual.pdf, ConverText uses books/convertext.yaml (inherited).

Creating Configuration Files

Initialize global config:

convertext --init-config

Create directory config:

# Copy example file
cp convertext.yaml.example convertext.yaml

# Or create from scratch
cat > convertext.yaml << EOF
output:
  directory: ~/Documents/converted
  overwrite: false
documents:
  encoding: utf-8
EOF

Configuration Example

See convertext.yaml.example for all available options. Here's a common configuration:

# Output settings
output:
  directory: ~/Documents/converted
  filename_pattern: "{name}.{ext}"
  overwrite: false

# Document settings
documents:
  encoding: utf-8

Key Configuration Options

Section	Default	Description
`output.directory`	`null`	Output directory (null = source dir)
`output.filename_pattern`	`{name}.{ext}`	Output filename pattern
`output.overwrite`	`false`	Overwrite existing files
`documents.encoding`	`utf-8`	Text file encoding
`documents.title_from_filename`	`false`	Use filename as document title

CLI Reference

Usage: convertext [OPTIONS] [FILES]...

  ConverText - Lightweight universal text converter.

Options:
  -f, --format TEXT            Output format(s), comma-separated
  -o, --output PATH            Output directory
  -c, --config PATH            Custom config file
  --overwrite                  Overwrite existing files
  --list-formats               List all supported formats
  --init-config                Initialize user config file
  --version                    Show version
  -v, --verbose                Verbose output (shows conversion hops)
  --keep-intermediate          Keep intermediate files in multi-hop conversions
  --help                       Show help message

Use Cases

1. Documentation Workflow

# Write docs in Markdown, publish as HTML and PDF
convertext docs/*.md --format html
convertext docs/*.md --format pdf

# Generate EPUB documentation
convertext manual.md --format epub

2. Ebook Management

# Convert ebooks to text for reading on e-readers
convertext library/*.epub --format txt --output ~/ereader/

# Create EPUB from your writing
convertext novel.md --format epub

3. Archive Conversion

# Convert old Word documents to Markdown for version control
convertext archive/*.docx --format md --output ./converted/

# Extract text from PDFs
convertext reports/*.pdf --format txt

4. Blog Publishing

# Convert Markdown posts to HTML
convertext posts/*.md --format html --output ./public/

# Create downloadable EPUB versions
convertext posts/*.md --format epub --output ./public/downloads/

5. Research & Note-Taking

# Convert research PDFs to Markdown for notes
convertext papers/*.pdf --format md

# Create EPUB from notes for mobile reading
convertext notes/*.md --format epub

Architecture

ConverText uses an intermediate Document format for conversions:

Input Format → Document (internal) → Output Format

This allows any-to-any conversions without N² converter implementations.

Key Components

BaseConverter: Abstract base for all format converters
Document: Intermediate representation (metadata, content blocks, images)
ConverterRegistry: Routes source→target format conversions with BFS pathfinding
ConversionEngine: Orchestrates conversions and multi-hop chaining
Config: Manages configuration with priority merging

Native Implementations

ConverText implements lightweight native Python parsers for ebook formats:

EPUB: Native Python reader/writer using zipfile + lxml
- Reads: Parses OPF metadata and spine order
- Writes: Generates EPUB 3 structure (container.xml, OPF, NCX, XHTML)
AZW3/KF8: Native Python reader/writer using PDB container with MOBI v8 headers
- Reads: PDB parser with PalmDOC decompression and EXTH metadata extraction
- Writes: PDB structure with KF8 headers, FDST, and PalmDOC compression
ODT: Native Python reader using zipfile + lxml
FB2: Native Python reader/writer using lxml XML parser

Development

Setup

git clone https://github.com/danielcorsano/convertext.git
cd convertext
poetry install

Run Tests

pytest
pytest -v                    # Verbose
pytest --cov                 # With coverage

Code Quality

black .                      # Format code
ruff check convertext/       # Lint
mypy convertext/             # Type check

Manual Testing

convertext --help
convertext test.md --format html --verbose

Related Projects

Want to listen to your text files instead of reading them? Try audiobook-reader - converts text, ebooks, and documents into natural-sounding audiobooks.

💝 Support This Project

If you find this tool helpful, please consider sponsoring the project. I created and maintain this software alone as a public service, and donations help me improve it and develop requested features. If I get $99 of donations, I will use it to pay for the Apple developer program so I can make iOS versions of all my open source apps.

Your support makes a real difference in keeping this project active and growing. Thank you!

Support

License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Apr 3, 2026

0.2.2

Jan 28, 2026

0.2.1

Jan 13, 2026

0.2.0

Dec 26, 2025

0.1.3

Nov 23, 2025

0.1.2

Nov 23, 2025

0.1.1

Oct 13, 2025

0.1.0

Oct 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convertext-0.3.0.tar.gz (42.4 kB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

convertext-0.3.0-py3-none-any.whl (60.3 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file convertext-0.3.0.tar.gz.

File metadata

Download URL: convertext-0.3.0.tar.gz
Upload date: Apr 3, 2026
Size: 42.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/25.4.0

File hashes

Hashes for convertext-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`34dd0e90600523967a0a8fda61a21b8d56f407a8f2c923ef96b5fddbe4e8b570`
MD5	`2d8995ed98f15db757ae1ed717d417bc`
BLAKE2b-256	`e1ffc3f4c2a1bd0ddea5bd63275f5714d0e5788c8343e6c9fa8134b7aeee7f9c`

See more details on using hashes here.

File details

Details for the file convertext-0.3.0-py3-none-any.whl.

File metadata

Download URL: convertext-0.3.0-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 60.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.2.1 CPython/3.14.2 Darwin/25.4.0

File hashes

Hashes for convertext-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`426152c8ea8ed8e1ca5b3021e64f3e217a7a6982ac4d70716d97b174328b5724`
MD5	`03cd841d7394216b87a8a20b46da11fb`
BLAKE2b-256	`742ea1a2af68993f2a1756836911c1622351a5b9dadd0e74514a626c1fcd5de1`

See more details on using hashes here.

convertext 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ConverText

Supported Formats

Features

Installation

Quick Start

Command Line

Python / Jupyter

Usage Examples

Single File Conversion

Multiple Output Formats

Batch Conversion

Advanced Options

Working with Ebooks

Multi-Hop Conversion

Format Matrix

Configuration

How Configuration Works

Directory-Based Configuration

Creating Configuration Files

Configuration Example

Key Configuration Options

CLI Reference

Use Cases

1. Documentation Workflow

2. Ebook Management

3. Archive Conversion

4. Blog Publishing

5. Research & Note-Taking

Architecture

Key Components

Native Implementations

Development

Setup

Run Tests

Code Quality

Manual Testing

Related Projects

💝 Support This Project

Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes