Skip to main content

Bulk convert images to WebP and automatically update URLs in markdown files

Project description

bulk-webp-url-replacer

PyPI version Python 3.9+ License: MIT

Bulk convert images to WebP and automatically update URLs in markdown files with a custom CDN prefix.

Features

  • 🔍 Extract image URLs from markdown files (frontmatter, galleries, inline images)
  • 📥 Download images from remote URLs (parallel downloads)
  • 🖼️ Convert to optimized WebP format
  • 🔄 Replace original URLs with new CDN-prefixed paths
  • ⏭️ Skip already-processed images and excluded extensions
  • 👀 Dry-run mode to preview changes

Installation

pip install bulk-webp-url-replacer

Or install from source:

git clone https://github.com/HoangYell/bulk-webp-url-replacer.git
cd bulk-webp-url-replacer
pip install -e .

Usage

CLI

# Dry run - preview what would be processed
bulk-webp-url-replacer \
  --scan-dir ./content \
  --output-dir ./webp_images \
  --dry-run

# Full run with custom URL prefix
bulk-webp-url-replacer \
  --scan-dir ./content \
  --output-dir ./webp_images \
  --new-url-prefix "https://cdn.example.com/images"

# Faster with more threads
bulk-webp-url-replacer \
  --scan-dir ./content \
  --output-dir ./webp_images \
  --new-url-prefix "https://cdn.example.com/images" \
  --threads 8

As Python Module

python -m bulk_webp_url_replacer \
  --scan-dir ./content \
  --output-dir ./webp_images \
  --new-url-prefix "https://cdn.example.com/images"

Programmatic Usage

from bulk_webp_url_replacer import ImageETL, ImageURLExtractor

# Full ETL pipeline
etl = ImageETL(
    content_dir="./content",
    webp_dir="./webp_images",
    webp_base_url="https://cdn.example.com/images",
    quality=80,
    max_width=1200,
    exclude_extensions=["gif", "svg", "webp", "ico"],
    threads=4
)

# Dry run to preview changes
result = etl.run(dry_run=True)
print(f"Found {result.total_urls} URLs, {result.skipped} already processed")

# Full run
result = etl.run(dry_run=False)
print(f"Converted {result.converted} images, {result.failed} failed")

# Or just extract URLs without processing
extractor = ImageURLExtractor()
urls = extractor.extract_from_directory("./content")
for file_path, line_num, url in urls:
    print(f"{file_path}:{line_num} -> {url}")

Options

Option Required Default Description
--scan-dir Yes - Directory to scan for files containing image URLs
--output-dir Yes - Directory to save converted WebP images
--new-url-prefix No - URL prefix to replace old image URLs
--quality No 80 WebP quality 1-100
--max-width No 1200 Max image width in pixels
--exclude-ext No gif svg webp ico File extensions to skip
--threads No 4 Number of parallel download threads
--dry-run No - Preview changes without downloading or modifying files

Supported Patterns

The tool detects image URLs in:

# YAML frontmatter
---
image: "https://example.com/image.jpg"
---

# TOML frontmatter
+++
image = "https://example.com/image.jpg"
+++

# Gallery shortcodes
{{< gallery >}}
- https://example.com/photo1.jpg
- https://example.com/photo2.png
{{< /gallery >}}

# HTML img tags in shortcodes
{{< embed >}}
<img src="https://example.com/image.jpg" width="250" height="250"/>
{{< /embed >}}

# Standard markdown
![Alt text](https://example.com/image.jpg)

Output

After running, you'll have:

  1. WebP images in your --output-dir
  2. mapping.json tracking original → WebP conversions
  3. Updated files with new URLs

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bulk_webp_url_replacer-0.1.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bulk_webp_url_replacer-0.1.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file bulk_webp_url_replacer-0.1.1.tar.gz.

File metadata

  • Download URL: bulk_webp_url_replacer-0.1.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for bulk_webp_url_replacer-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fa97afab404065062a8d1212f057eb7891fade6a15847792a02b77c751fd7633
MD5 5156358cd0c8024fb0a08d13f2de193f
BLAKE2b-256 a4f568452c0e3cd08cf91737cbd1f7b958e8ffa5e33530e20fac5d3f092491af

See more details on using hashes here.

File details

Details for the file bulk_webp_url_replacer-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bulk_webp_url_replacer-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0bb1ff7728f66083d03e6db55ab3423352f54f9b4e3181ce26c9e7fe8b66a3a7
MD5 f0a0348a17d120e57941b97ea4998196
BLAKE2b-256 585729c87e19a7e4526c42b92f295f7c743d107f4abd5279bb51bd5fd16ddbb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page