Skip to main content

Convert HTML files and web pages to Markdown format

Project description

webpage2md

A command-line tool to convert HTML files and web pages to Markdown format.

Features

  • Convert local HTML files to Markdown
  • Convert web pages to Markdown
  • Support for multiple input files
  • Custom output directory and filename options
  • Automatic installation of required dependencies
  • Uses Playwright for reliable web scraping
  • Uses Pandoc for high-quality HTML to Markdown conversion

Installation

pip install webpage2md

Usage

As a Python Package

from webpage2md import convert_html, convert_url

# Convert HTML string to markdown
html = '<h1>Hello World</h1>'
markdown = convert_html(html)

# Convert webpage to markdown
markdown = convert_url('https://example.com')

Command Line Usage

Basic usage:

webpage2md example.html                  # Convert local file
webpage2md https://example.com          # Convert web page
webpage2md file1.html file2.html        # Convert multiple files

Options:

webpage2md -o output_dir/ file.html     # Specify output directory
webpage2md -n custom_name.md file.html  # Specify output filename
webpage2md --stdout file.html           # Print to stdout
webpage2md -q file.html                 # Quiet mode
webpage2md -v file.html                 # Verbose mode

For help:

webpage2md --help

Requirements

  • Python 3.7+
  • Playwright (automatically installed)
  • Pandoc (automatically installed)

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webpage2md-1.0.0.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webpage2md-1.0.0-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file webpage2md-1.0.0.tar.gz.

File metadata

  • Download URL: webpage2md-1.0.0.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for webpage2md-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4870dd9d0fd4305d0b2fe8850752bedb1749104d05df7f52950e3415eac3d280
MD5 134d1ec91cb36f67fd1ddb45003cd486
BLAKE2b-256 5d336f0fa69b6a8715dbb3fbbc2aa5edf84796b2390a0e45d7790947a4000a76

See more details on using hashes here.

File details

Details for the file webpage2md-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: webpage2md-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for webpage2md-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 138c9297991bb50473e328345043140df8ee2b7247a62c24a705b7e4436c50f2
MD5 379d52b158b6d50cab891870df040cac
BLAKE2b-256 67f71a40b12dbe44622e18780ed748864c32a0647ef61ed1e51ba708ced563ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page