Convert HTML files and web pages to Markdown format
Project description
webpage2md
A command-line tool to convert HTML files and web pages to Markdown format.
Features
- Convert local HTML files to Markdown
- Convert web pages to Markdown
- Support for multiple input files
- Custom output directory and filename options
- Automatic installation of required dependencies
- Uses Playwright for reliable web scraping
- Uses Pandoc for high-quality HTML to Markdown conversion
Installation
pip install webpage2md
Usage
As a Python Package
from webpage2md import convert_html, convert_url
# Convert HTML string to markdown
html = '<h1>Hello World</h1>'
markdown = convert_html(html)
# Convert webpage to markdown
markdown = convert_url('https://example.com')
Command Line Usage
Basic usage:
webpage2md example.html # Convert local file
webpage2md https://example.com # Convert web page
webpage2md file1.html file2.html # Convert multiple files
Options:
webpage2md -o output_dir/ file.html # Specify output directory
webpage2md -n custom_name.md file.html # Specify output filename
webpage2md --stdout file.html # Print to stdout
webpage2md -q file.html # Quiet mode
webpage2md -v file.html # Verbose mode
For help:
webpage2md --help
Requirements
- Python 3.7+
- Playwright (automatically installed)
- Pandoc (automatically installed)
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
webpage2md-1.0.0.tar.gz
(11.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file webpage2md-1.0.0.tar.gz.
File metadata
- Download URL: webpage2md-1.0.0.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4870dd9d0fd4305d0b2fe8850752bedb1749104d05df7f52950e3415eac3d280
|
|
| MD5 |
134d1ec91cb36f67fd1ddb45003cd486
|
|
| BLAKE2b-256 |
5d336f0fa69b6a8715dbb3fbbc2aa5edf84796b2390a0e45d7790947a4000a76
|
File details
Details for the file webpage2md-1.0.0-py3-none-any.whl.
File metadata
- Download URL: webpage2md-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
138c9297991bb50473e328345043140df8ee2b7247a62c24a705b7e4436c50f2
|
|
| MD5 |
379d52b158b6d50cab891870df040cac
|
|
| BLAKE2b-256 |
67f71a40b12dbe44622e18780ed748864c32a0647ef61ed1e51ba708ced563ce
|