Skip to main content

A tool to download and organize images referenced in markdown files

Project description

Markdown Image Downloader

A Python package that automatically downloads and manages images referenced in markdown files, storing them locally in an _attachments folder. This script is particularly useful for maintaining local copies of images in markdown documentation and ensuring consistent image availability.

Or just for Obsidian's Readwise export, which I made this for.

Previously hosted on GitHub Gist. Moved here to allow for easier maintenance and contributions, if any. Also published to PyPI for convenience.

Requirements

Python 3.9+

Installation

Install directly from PyPI using pip:

pip install markdown-image-downloader

Usage

Run the package from the command line, providing the folder containing your markdown files as an argument:

markdown-image-downloader <folder_name>

Example

markdown-image-downloader ../Readwise/Articles

This will:

  1. Scan all markdown files in the ../Readwise/Articles folder
  2. Download any images referenced in the markdown files
  3. Store them in ../Readwise/Articles/_attachments
  4. Update the markdown files to reference the local copies

Features

  • Uses custom HTTP headers to avoid download blocks
  • Downloads images from URLs referenced in markdown files
  • Creates local copies of images in an _attachments directory
  • Automatically updates links in the markdown files with new local image paths
  • Compresses large images to reduce storage space
  • Supports multithreaded concurrent downloads
  • Uses rate limit to prevent server overload and download blocks
  • Progress bar for tracking download status
  • Maintains detailed logging of error operations
  • Sanitizes filenames for cross-platform compatibility
  • Supports for rerunning the script without re-downloading images

How It Works

  1. Scanning: The script scans all .md files in the specified folder for image references.
  2. Downloading: For each image URL found:
    • Downloads the image if it's not already in _attachments
    • Compresses images larger than 500KB while maintaining quality
    • Generates unique filenames based on content hash
  3. Organization: Creates an _attachments folder to store all images
  4. Updating: Updates markdown files to reference the local copies in _attachments

Features in Detail

Image Compression

  • Automatically compresses large images
  • Maintains reasonable quality through progressive compression
  • Converts RGBA images to RGB with white background

Filename Handling

  • Preserves original filenames
  • Sanitizes filenames for cross-platform compatibility

Concurrent Processing

  • Uses ThreadPoolExecutor for parallel downloads
  • Includes progress bar for tracking downloads
  • Implements rate limiting to prevent server overload

Error Handling

  • Comprehensive logging of all operations
  • Graceful handling of download failures
  • Skips already processed images

Logging

The script creates detailed logs in a logs directory:

  • Location: ./logs/image_downloader.log
  • Includes timestamps, operation details, and error messages
  • New log file created for each run

Limitations

  • Only processes image links in markdown format: ![alt](url)
  • Requires internet connection for downloading external images
  • May be rate-limited or just straight denied by some servers
  • SVG files are downloaded but not compressed

Contributing

Feel free to submit issues, fork the repository, and create pull requests for any improvements.

License

This project is available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdown_image_downloader-0.1.7.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

markdown_image_downloader-0.1.7-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file markdown_image_downloader-0.1.7.tar.gz.

File metadata

File hashes

Hashes for markdown_image_downloader-0.1.7.tar.gz
Algorithm Hash digest
SHA256 9d85195732d0014c08f2d913710c7d2c4d93a85e8e176c10d383c034fbee8fca
MD5 4b1e3165b0ad8aeec96b36f1b9d141af
BLAKE2b-256 c8e238468dd72242698444a7b18df38a1807833742a1c3fc411e4732ced08971

See more details on using hashes here.

File details

Details for the file markdown_image_downloader-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for markdown_image_downloader-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 cf5b34e846f9b9bba6d8fa2c5c477e42cef7ad35ca3f8b341bca3d920bad9a28
MD5 ed7fea6cd91cf9bdd2fb4723c452dcd4
BLAKE2b-256 e8285bc31ca91205d4c46ea6166e1807195bc40752b86ed4c69fe52c2903f1aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page