Skip to main content

CLI tool to convert web articles to PDF, EPUB, FB2, and Markdown

Project description

url_to_book

PyPI version Python versions License: MIT

CLI tool to extract article content from a web page and save it in various formats (PDF, EPUB, FB2, Markdown).

Features

  • Extracts article text, title, and images
  • Preserves text formatting (bold, italic, links)
  • Multiple output formats: PDF, EPUB, FB2, Markdown
  • Filters out ads and tracking images
  • Supports Cyrillic text
  • Multiple font choices with Unicode/Cyrillic support (Noto Sans, Liberation, DejaVu, Free fonts)
  • Automatic font detection and fallback

Installation

From PyPI (recommended)

pip install url-to-book

From source

git clone https://github.com/RomanAverin/url_to_book.git
cd url_to_book
pip install -e .

Usage

# Basic usage (uses default font)
url-to-book https://example.com/article -o article.pdf

# With custom title
url-to-book https://example.com/article -o article.pdf --title "My Title"

# Without images
url-to-book https://example.com/article -o article.pdf --no-images

# Verbose output
url-to-book https://example.com/article -o article.pdf -v

# List available fonts
url-to-book --list-fonts

# Use specific font (sans-serif)
url-to-book https://example.com/article -o article.pdf --font noto-sans

# Use serif font
url-to-book https://example.com/article -o article.pdf --font noto-serif

# Use Liberation Sans (metrics-compatible with Arial)
url-to-book https://example.com/article -o article.pdf --font liberation-sans

# With verbose output showing which font is used
url-to-book https://example.com/article -o article.pdf -v --font noto-serif

Output Formats

The tool supports multiple output formats. Each format has different capabilities and use cases.

Supported Formats

Use --list-formats to see available formats:

url-to-book --list-formats

Output:

Available output formats:
  * pdf (features: fonts, images, links)
  * epub (features: images, links)
  * fb2 (features: images, links)
  * md (features: images, links)

Format Comparison

Format Description Features Best For
PDF Portable Document Format fonts, images, links Printing, universal reading, archiving
EPUB Electronic Publication images, links E-readers (Kindle, Kobo, etc.)
FB2 FictionBook 2.0 (XML) images, links Russian e-book readers
MD Markdown with YAML frontmatter images, links Further processing, version control

Feature explanation:

  • fonts - Customizable font families (8 options for PDF)
  • images - Embedded images support
  • links - Clickable hyperlinks preserved

Format Usage Examples

Extract to PDF (default)

# Default format (PDF)
url-to-book https://example.com/article -o article.pdf

# Explicit format specification
url-to-book https://example.com/article -o article.pdf -f pdf

# PDF with custom font
url-to-book https://example.com/article -o article.pdf --font noto-serif

Extract to EPUB

# For e-readers
url-to-book https://example.com/article -o article.epub -f epub

# EPUB without images (smaller file)
url-to-book https://example.com/article -o article.epub -f epub --no-images

Extract to FB2

# For Russian e-book readers
url-to-book https://example.com/article -o article.fb2 -f fb2

# FB2 with limited images
url-to-book https://example.com/article -o article.fb2 -f fb2 --max-images 5

Extract to Markdown

# For version control or further processing
url-to-book https://example.com/article -o article.md -f md

# Images will be saved to article_images/ directory

Converting Markdown Files

You can convert previously extracted Markdown files to other formats:

# First, extract to Markdown
url-to-book https://example.com/article -o article.md -f md

# Then convert to PDF
url-to-book article.md -o article.pdf -f pdf

# Or to EPUB
url-to-book article.md -o article.epub -f epub

# Or to FB2
url-to-book article.md -o article.fb2 -f fb2

This workflow is useful for:

  • Extracting once, converting to multiple formats
  • Editing content before final conversion
  • Version control of article content

Format-Specific Notes

PDF:

  • Only format supporting font selection (use --font option)
  • Best for printing and archiving
  • Supports 8 font families (see "Available Fonts" section)

EPUB:

  • Standard format for most e-readers
  • Reflowable text (adapts to screen size)
  • Wide compatibility (Calibre, Apple Books, etc.)

FB2:

  • XML-based format popular in Russia
  • Good for Russian-language e-book readers
  • Embedded images as base64

Markdown:

  • Human-readable plain text
  • YAML frontmatter with metadata (title, authors, source URL)
  • Images saved to {filename}_images/ directory
  • Can be edited before converting to other formats

Available Fonts

Note: Font selection is only available for PDF format.

The tool supports the following font families with Unicode/Cyrillic support:

  • noto-sans (Noto Sans) - Google's comprehensive sans-serif font
  • noto-serif (Noto Serif) - Google's comprehensive serif font
  • liberation-sans (Liberation Sans) - Metrics-compatible with Arial
  • liberation-serif (Liberation Serif) - Metrics-compatible with Times New Roman
  • free-sans (Free Sans) - GNU FreeFont sans-serif
  • free-serif (Free Serif) - GNU FreeFont serif
  • dejavu-sans (DejaVu Sans) - Popular Linux sans-serif font
  • dejavu-serif (DejaVu Serif) - Popular Linux serif font

The tool will automatically detect which fonts are installed in your system and use the first available one as default. Use --list-fonts to see which fonts are available on your system.

Development

# Install dev dependencies
uv sync --extra dev

# Run tests
uv run pytest

# Run linter
uv run pylint url_to_book

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

url_to_book-1.0.0.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

url_to_book-1.0.0-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file url_to_book-1.0.0.tar.gz.

File metadata

  • Download URL: url_to_book-1.0.0.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for url_to_book-1.0.0.tar.gz
Algorithm Hash digest
SHA256 72a545dad2cdb70d00d6aa587238190f0131e60b398362c239c30bfd28269e4f
MD5 4334a23b5fd8e0c73aad9db5e586ff85
BLAKE2b-256 3e3ec5da1ee25042199c4ea8ebcd5a30fc819b4cde3c82570f1d510e5e02c122

See more details on using hashes here.

File details

Details for the file url_to_book-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: url_to_book-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for url_to_book-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 21cad3c280013631bff5297b1f2c568c57dda546879ed702236827c96165e0d3
MD5 5658de4d6553e5d06175816ed6f71131
BLAKE2b-256 0ff0b0dbba7edb6c76658203a3888a15a8a55b46413f5750b494799c69692a34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page