Skip to main content

Convert an article or web page to Markdown

Project description

article-to-md

A CLI tool to extract core content from webpages or local HTML and convert it to Markdown.

Usage: article-to-md [OPTIONS] SOURCE

Convert an article or web page to Markdown.

Commands:
  --help, -h: Display this message and exit.
  --version: Display application version.

Parameters:
  SOURCE, --source: A URL or local HTML file to process. [required]
  --method: The extraction engine to use. [choices: readability, trafilatura, raw] [default: readability]
  --favor: Whether to favor 'precision' or 'recall' when using trafilatura. [choices: recall, precision]
  --remove-ads, --no-remove-ads: Apply EasyList cosmetic filters to remove ads before processing. [default: False]
  --strip-tag, --no-strip: HTML tag to strip from the final output. Repeat this flag to remove multiple tags. Use --no-strip to disable. [default: ('img',)]

Installation

uv is recommended to install the package in a managed environment:

uv tool install article-to-md

Note: To use the readability method, Node.js (v14+) must be installed on your system. Without Node.js, the tool uses Python-based extraction.

Usage

From a publicly accessible web page:

article-to-md https://example.com/article

From a local HTML file:

article-to-md /path/to/file.html

Advanced options:

  • --remove-ads - Basic ad removal from the DOM using generic cosmetic filters from EasyList
  • --method - Affects pre-processing of the DOM before conversion to Markdown.
    • readability (default) - Uses ReadabiliPy which can use the original Readability.js Node package when Node is present on the system.
    • trafilatura - Uses the Trafilatura pure Python library
    • raw - Sends the full DOM to be converted
  • --favor - Only used with --method trafilatura to control options documented here.
  • --strip-tag - An HTML tag to be stripped from the DOM before conversion
    • This argument can be supplied multiple times
    • By default, <img> tags are stripped; use --no-strip to keep them.

Features

  • Stealth Requests: Uses curl_cffi to impersonate a Chrome browser and avoid bot detection.
  • Enhanced Markdown:
    • Converts <var> to italics.
    • Includes <abbr> titles in the text output.
    • Renders Markdown tables from HTML tables

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

article_to_md-0.4.1.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

article_to_md-0.4.1-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file article_to_md-0.4.1.tar.gz.

File metadata

  • Download URL: article_to_md-0.4.1.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for article_to_md-0.4.1.tar.gz
Algorithm Hash digest
SHA256 196e62060528f4dd7330e5f2a3779c3d6184e5136d6a20e861a8be57c9f1c966
MD5 2faa4c0a286de7dd6c2c34400856f68c
BLAKE2b-256 d8dd3ce34d31efcc2326b2348d8a04fa4fac8a4b5cafec764107bee183d85be7

See more details on using hashes here.

File details

Details for the file article_to_md-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: article_to_md-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for article_to_md-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f45e1bc06ea6f3de5dc1d1fcb31f09df6cd90501f8bc2be8e2c299b05e45f398
MD5 7eb54d06e9cc7360791cb0c6b9ab2175
BLAKE2b-256 2822000422b54cc85e6def74d6f9e8f03ac8cecd3b1d9cb73dde8ad825a82f34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page