CLI tool to convert web articles to PDF, EPUB, FB2, and Markdown
Project description
url_to_book
CLI tool to extract article content from a web page and save it in various formats (PDF, EPUB, FB2, Markdown).
Features
- Extracts article text, title, and images
- Preserves text formatting (bold, italic, links)
- Multiple output formats: PDF, EPUB, FB2, Markdown
- Filters out ads and tracking images
- Supports Cyrillic text
- Multiple font choices with Unicode/Cyrillic support (Noto Sans, Liberation, DejaVu, Free fonts)
- Automatic font detection and fallback
Installation
From PyPI (recommended)
pip install url-to-book
From source
git clone https://github.com/RomanAverin/url_to_book.git
cd url_to_book
pip install -e .
Usage
# Basic usage (uses default font)
url-to-book https://example.com/article -o article.pdf
# With custom title
url-to-book https://example.com/article -o article.pdf --title "My Title"
# Without images
url-to-book https://example.com/article -o article.pdf --no-images
# Verbose output
url-to-book https://example.com/article -o article.pdf -v
# List available fonts
url-to-book --list-fonts
# Use specific font (sans-serif)
url-to-book https://example.com/article -o article.pdf --font noto-sans
# Use serif font
url-to-book https://example.com/article -o article.pdf --font noto-serif
# Use Liberation Sans (metrics-compatible with Arial)
url-to-book https://example.com/article -o article.pdf --font liberation-sans
# With verbose output showing which font is used
url-to-book https://example.com/article -o article.pdf -v --font noto-serif
Output Formats
The tool supports multiple output formats. Each format has different capabilities and use cases.
Supported Formats
Use --list-formats to see available formats:
url-to-book --list-formats
Output:
Available output formats:
* pdf (features: fonts, images, links)
* epub (features: images, links)
* fb2 (features: images, links)
* md (features: images, links)
Format Comparison
| Format | Description | Features | Best For |
|---|---|---|---|
| Portable Document Format | fonts, images, links | Printing, universal reading, archiving | |
| EPUB | Electronic Publication | images, links | E-readers (Kindle, Kobo, etc.) |
| FB2 | FictionBook 2.0 (XML) | images, links | Russian e-book readers |
| MD | Markdown with YAML frontmatter | images, links | Further processing, version control |
Feature explanation:
- fonts - Customizable font families (8 options for PDF)
- images - Embedded images support
- links - Clickable hyperlinks preserved
Format Usage Examples
Extract to PDF (default)
# Default format (PDF)
url-to-book https://example.com/article -o article.pdf
# Explicit format specification
url-to-book https://example.com/article -o article.pdf -f pdf
# PDF with custom font
url-to-book https://example.com/article -o article.pdf --font noto-serif
Extract to EPUB
# For e-readers
url-to-book https://example.com/article -o article.epub -f epub
# EPUB without images (smaller file)
url-to-book https://example.com/article -o article.epub -f epub --no-images
Extract to FB2
# For Russian e-book readers
url-to-book https://example.com/article -o article.fb2 -f fb2
# FB2 with limited images
url-to-book https://example.com/article -o article.fb2 -f fb2 --max-images 5
Extract to Markdown
# For version control or further processing
url-to-book https://example.com/article -o article.md -f md
# Images will be saved to article_images/ directory
Converting Markdown Files
You can convert previously extracted Markdown files to other formats:
# First, extract to Markdown
url-to-book https://example.com/article -o article.md -f md
# Then convert to PDF
url-to-book article.md -o article.pdf -f pdf
# Or to EPUB
url-to-book article.md -o article.epub -f epub
# Or to FB2
url-to-book article.md -o article.fb2 -f fb2
This workflow is useful for:
- Extracting once, converting to multiple formats
- Editing content before final conversion
- Version control of article content
Format-Specific Notes
PDF:
- Only format supporting font selection (use
--fontoption) - Best for printing and archiving
- Supports 8 font families (see "Available Fonts" section)
EPUB:
- Standard format for most e-readers
- Reflowable text (adapts to screen size)
- Wide compatibility (Calibre, Apple Books, etc.)
FB2:
- XML-based format popular in Russia
- Good for Russian-language e-book readers
- Embedded images as base64
Markdown:
- Human-readable plain text
- YAML frontmatter with metadata (title, authors, source URL)
- Images saved to
{filename}_images/directory - Can be edited before converting to other formats
Available Fonts
Note: Font selection is only available for PDF format.
The tool supports the following font families with Unicode/Cyrillic support:
- noto-sans (Noto Sans) - Google's comprehensive sans-serif font
- noto-serif (Noto Serif) - Google's comprehensive serif font
- liberation-sans (Liberation Sans) - Metrics-compatible with Arial
- liberation-serif (Liberation Serif) - Metrics-compatible with Times New Roman
- free-sans (Free Sans) - GNU FreeFont sans-serif
- free-serif (Free Serif) - GNU FreeFont serif
- dejavu-sans (DejaVu Sans) - Popular Linux sans-serif font
- dejavu-serif (DejaVu Serif) - Popular Linux serif font
The tool will automatically detect which fonts are installed in your system and use the first available one as default. Use --list-fonts to see which fonts are available on your system.
Development
# Install dev dependencies
uv sync --extra dev
# Run tests
uv run pytest
# Run linter
uv run pylint url_to_book
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file url_to_book-1.0.1.tar.gz.
File metadata
- Download URL: url_to_book-1.0.1.tar.gz
- Upload date:
- Size: 24.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41c3c7784450fe2cdeaa2abd476831deaf5105d772e208874b2a117d63c365cd
|
|
| MD5 |
a322d5e3f1b81590532cfa64bf259fd8
|
|
| BLAKE2b-256 |
5aac01cd4fa83742d485701d29ead9ddf84a329f0d9943a188db2a03e8be674c
|
Provenance
The following attestation bundles were made for url_to_book-1.0.1.tar.gz:
Publisher:
publish-to-pypi.yml on RomanAverin/url_to_book
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
url_to_book-1.0.1.tar.gz -
Subject digest:
41c3c7784450fe2cdeaa2abd476831deaf5105d772e208874b2a117d63c365cd - Sigstore transparency entry: 1004768265
- Sigstore integration time:
-
Permalink:
RomanAverin/url_to_book@95272c764109f8afd4f455b061b63cad2a5ab676 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/RomanAverin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@95272c764109f8afd4f455b061b63cad2a5ab676 -
Trigger Event:
release
-
Statement type:
File details
Details for the file url_to_book-1.0.1-py3-none-any.whl.
File metadata
- Download URL: url_to_book-1.0.1-py3-none-any.whl
- Upload date:
- Size: 31.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ceed4c33104d877930413adf30eb644ade36f2f628ea0d6f50ead64eb353b66
|
|
| MD5 |
d71123201ba5001eb73df338179f2fb6
|
|
| BLAKE2b-256 |
65821d13f9278f703913537883f09f5584fadf475405f1a1c9449b7f9be11d58
|
Provenance
The following attestation bundles were made for url_to_book-1.0.1-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on RomanAverin/url_to_book
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
url_to_book-1.0.1-py3-none-any.whl -
Subject digest:
5ceed4c33104d877930413adf30eb644ade36f2f628ea0d6f50ead64eb353b66 - Sigstore transparency entry: 1004768268
- Sigstore integration time:
-
Permalink:
RomanAverin/url_to_book@95272c764109f8afd4f455b061b63cad2a5ab676 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/RomanAverin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@95272c764109f8afd4f455b061b63cad2a5ab676 -
Trigger Event:
release
-
Statement type: