Convert an article or web page to Markdown
Project description
article-to-md
A CLI tool to extract core content from webpages or local HTML and convert it to Markdown.
Usage: article-to-md [OPTIONS] SOURCE
Convert an article or web page to Markdown.
Commands:
--help, -h: Display this message and exit.
--version: Display application version.
Parameters:
SOURCE, --source: A URL or local HTML file to process. [required]
--method: The extraction engine to use. [choices: readability, trafilatura, raw] [default: readability]
--favor: Whether to favor 'precision' or 'recall' when using trafilatura. [choices: recall, precision]
--remove-ads, --no-remove-ads: Apply EasyList cosmetic filters to remove ads before processing. [default: False]
--strip-tag, --no-strip: HTML tag to strip from the final output. Repeat this flag to remove multiple tags. Use --no-strip to disable. [default: ('img',)]
Installation
uv is recommended to install the package in a managed environment:
uv tool install article-to-md
Note: To use the readability method, Node.js (v14+) must be installed on your system. Without Node.js, the tool uses Python-based extraction.
Usage
From a publicly accessible web page:
article-to-md https://example.com/article
From a local HTML file:
article-to-md /path/to/file.html
Advanced options:
--remove-ads- Basic ad removal from the DOM using generic cosmetic filters from EasyList--method- Affects pre-processing of the DOM before conversion to Markdown.readability(default) - Uses ReadabiliPy which can use the original Readability.js Node package when Node is present on the system.trafilatura- Uses the Trafilatura pure Python libraryraw- Sends the full DOM to be converted
--favor- Only used with--method trafilaturato control options documented here.--strip-tag- An HTML tag to be stripped from the DOM before conversion- This argument can be supplied multiple times
- By default,
<img>tags are stripped; use--no-stripto keep them.
Features
- Stealth Requests: Uses curl_cffi to impersonate a Chrome browser and avoid bot detection.
- Enhanced Markdown:
- Converts
<var>to italics. - Includes
<abbr>titles in the text output. - Renders Markdown tables from HTML tables
- Converts
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file article_to_md-0.4.1.tar.gz.
File metadata
- Download URL: article_to_md-0.4.1.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
196e62060528f4dd7330e5f2a3779c3d6184e5136d6a20e861a8be57c9f1c966
|
|
| MD5 |
2faa4c0a286de7dd6c2c34400856f68c
|
|
| BLAKE2b-256 |
d8dd3ce34d31efcc2326b2348d8a04fa4fac8a4b5cafec764107bee183d85be7
|
File details
Details for the file article_to_md-0.4.1-py3-none-any.whl.
File metadata
- Download URL: article_to_md-0.4.1-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f45e1bc06ea6f3de5dc1d1fcb31f09df6cd90501f8bc2be8e2c299b05e45f398
|
|
| MD5 |
7eb54d06e9cc7360791cb0c6b9ab2175
|
|
| BLAKE2b-256 |
2822000422b54cc85e6def74d6f9e8f03ac8cecd3b1d9cb73dde8ad825a82f34
|