Skip to main content

Convert a Medium export ZIP into Hugo-ready Markdown page bundles with localized images.

Project description

medium2md

PyPI version Python Versions License: MIT

Convert a Medium export ZIP into clean, Hugo-ready Markdown page bundles.

medium2md is a CLI tool that transforms Medium's HTML export into properly structured Hugo content using page bundles — enabling full ownership of your content and a clean, reproducible migration from Medium to Hugo.


Table of Contents


Why This Exists

Medium allows you to export your account data as a ZIP archive, but the raw export:

  • Contains unstructured HTML
  • Includes inconsistent metadata
  • References remote image URLs

medium2md solves this by providing:

Feature Description
HTML → Markdown Converts Medium HTML posts to clean Markdown
Hugo front matter Generates YAML front matter from post metadata
Image localization Downloads remote images into each bundle; copies local images when present in the export
Canonical URL Preserves the original Medium URL
Conversion reports Summarizes what was converted and what was skipped
Incremental re-runs (planned) Re-run only changed posts

This tool is designed to be deterministic, reproducible, and CI-friendly.


Features

MVP (current)

  • Convert Medium export ZIP (posts under posts/ in the export)
  • Extract title and canonical URL; generate slug
  • Convert HTML to Markdown
  • Create Hugo page bundles with index.md and optional images/
  • Image localization: download remote images into the bundle; copy local images when present in the export
  • Basic slug collision handling (slug-2, slug-3, …)
  • Terminal progress and summary; per-post image count; prompt to create missing output dir

Planned

  • Extract date and optional metadata (tags, etc.) into front matter
  • Incremental runs via state file
  • Embed detection and shortcode conversion (YouTube, Twitter, Gist)
  • Pandoc backend option
  • Verification command
  • Theme-specific front matter mapping
  • Conversion report (e.g. JSON/file)

Installation

This project uses uv for dependency management.

git clone https://github.com/edgarbc/medium2md.git
cd medium2md
uv sync

Once published to PyPI, install with:

pip install medium2md-cli
# or with uv:
uv tool install medium2md-cli

The CLI command is still medium2md.


Usage

Copy your Medium export ZIP into the input/ directory (already set up and git-ignored):

cp ~/Downloads/medium-export.zip input/
uv run medium2md input/medium-export.zip --out ../blog/content/posts

Note: The input/ directory is tracked by git (via .gitkeep) so it exists after a fresh clone, but its contents are ignored — your ZIP files will never be accidentally committed.

Front Matter Example

Each converted post produces an index.md with Hugo-compatible YAML front matter. Current output:

---
title: "My Post Title"
draft: true
slug: "my-post-slug"
medium:
  canonical: "https://medium.com/@you/post-slug"
---

Additional keys (e.g. date, lastmod, tags) are planned.


Output Structure

Each Medium post becomes a Hugo page bundle. Image links in the Markdown point into the bundle’s images/ folder (remote images are downloaded; local images from the export are copied):

content/posts/
└── my-post-slug/
    ├── index.md
    └── images/
        ├── 1.png
        ├── 2.jpg
        └── …

Project Structure

medium2md/
├── medium2md/
│   ├── __init__.py
│   ├── cli.py
│   ├── pipeline.py
│   └── main.py
├── pyproject.toml
├── README.md
├── project-plan.md
└── input/
    └── medium-export.zip

Pipeline Architecture

medium2md follows a layered pipeline:

ZIP → extract → find posts → parse HTML → localize images (copy/download) → Markdown conversion → front matter + Hugo bundle write

Philosophy: Correctness first, cleverness later.


Development Roadmap

Milestone Focus Status
1 — MVP ZIP ingestion, HTML→Markdown, Hugo bundle writing, image localization ✅ Done
2 — Robustness Incremental state tracking, metadata fallback, verify command 📋 Planned
3 — Polish Embed conversion, theme config mapping, Pandoc backend, internal link rewriting 📋 Planned

Contributing

Contributions are welcome! To get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feat/my-feature)
  3. Make your changes
  4. Open a pull request (run uv run medium2md --help to confirm the CLI works)

Publishing to PyPI (maintainers)

  1. Bump version in pyproject.toml.
  2. Build: uv build (creates dist/).
  3. Install dev deps and upload: uv sync --extra dev then uv run twine upload dist/* (requires a PyPI API token; use __token__ as username).
  4. Optionally tag the release: git tag v0.1.0 && git push --tags.

License

This project is licensed under the MIT License.


Built by Edgar Bermudez and GitHub Copilot with 💖 to enable long-term content ownership and reproducible publishing workflows.

Not affiliated with Medium or any of its subsidiaries.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medium2md_cli-0.1.0.tar.gz (46.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medium2md_cli-0.1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file medium2md_cli-0.1.0.tar.gz.

File metadata

  • Download URL: medium2md_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 46.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for medium2md_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 aca99d78f172575332ef800c26c8d9671cc05ecb2425b516fc3bd6dd8e3b0d2a
MD5 ee1acb941aa7531121d74b090c9e9b95
BLAKE2b-256 4fef140c718d1ef309c018a4dbac72e1f73a983bc0a8634c14677ec0975effbe

See more details on using hashes here.

File details

Details for the file medium2md_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: medium2md_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for medium2md_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2213436b7585180219185efd472ea0edc27fb99329c40f86dc80ae3f6c931389
MD5 c1ae922dd23048b6d51d40e2c5939f0c
BLAKE2b-256 dc26f4d3a7c4dd1217bb728702de7702952d1e20e4ce139977aa44d54e340947

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page