Skip to main content

Turn any web article into clean Markdown via CLI.

Project description

墨探 (omni-article-markdown)

PyPI Python License PyPI Downloads Docker Pulls Stars

轻松将网页文章(博客、新闻、文档等)转换为 Markdown 格式。

简介

墨探的开发初衷,是为了解决一个问题:如何将来自互联网上各种不同网站的文章内容,精准且高效地转换成统一的Markdown格式。

众所周知,万维网上的网站设计风格迥异,其HTML结构也呈现出千差万别的特点。这种多样性给自动化内容提取和格式转换带来了巨大的困难。要实现一个能够适应各种复杂HTML结构的通用解决方案,并非易事。

我的想法是:从特定的网站开始适配,以点到面,逐步抽取出通用的解决方案,最后尽可能多的覆盖更多网站。

功能介绍

以下是一些网站示例,大家可以自己测试下效果。

站点 链接 备注
Medium link
csdn link
掘金 link
公众号 link
网易 link
简书 link
Towards Data Science link
Quantamagazine link
Cloudflare博客 link
阿里云开发者社区 link
微软技术文档 link
InfoQ link
博客园 link
思否 link
开源中国 link
Forbes link
少数派 link
语雀 link
腾讯云开发者社区 link
人人都是产品经理 link
Jetbrains博客 link
Claude文档 link
Anthropic link
Meta博客 link
Android Developers Blog link
Spring Blog link
Hackernoon link
领英博客 link
华尔街见闻 link
苹果开发者文档 link
百家号 link
Snowflake 技术博客 link
知乎专栏 link
今日头条 link
X Articles link
飞书 link
Google for Developers link
Dropbox.Tech link
Wikipedia link
虎嗅网 link
Freedium link 已失效

安装与升级

# pipx
pipx install omni-article-markdown
pipx upgrade omni-article-markdown

# pip
pip install omni-article-markdown
pip install -U omni-article-markdown

# uv
uv tool install omni-article-markdown
uv tool install omni-article-markdown --upgrade

安装完成后即可使用:

mdcli -h

基本用法

仅转换

mdcli https://example.com

保存到当前目录

mdcli https://example.com -s

保存到指定路径

mdcli https://example.com -s /home/user/

架构说明

墨探主要分为三个模块:

  • Reader 模块的功能是读取整个网页内容
  • Extractor 模块的功能是提取正文内容,清理无用数据
  • Parser 模块的功能是将 HTML 转换为 Markdown

贡献与反馈

赞助

如果你觉得墨探对你有帮助,可以给我家猫咪买点罐头 ❤️

https://yuzhi.tech/sponsor

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omni_article_markdown-0.2.2.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omni_article_markdown-0.2.2-py3-none-any.whl (67.2 kB view details)

Uploaded Python 3

File details

Details for the file omni_article_markdown-0.2.2.tar.gz.

File metadata

  • Download URL: omni_article_markdown-0.2.2.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for omni_article_markdown-0.2.2.tar.gz
Algorithm Hash digest
SHA256 d5be98778a974fc0de5047d384774e1e95b7c12f946568c12d7ca6855b8a79f3
MD5 2b8f574a44f20ae54db9aed58508450b
BLAKE2b-256 dc134dcf2406c25653ef387ed266c71a15af631106f8a9908f07b110a0fd99b1

See more details on using hashes here.

File details

Details for the file omni_article_markdown-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: omni_article_markdown-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 67.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for omni_article_markdown-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cab13f8ff19b919fbfd3ead3e5cebbdbe3f4d9110c0bc3404cbee7cdcbd06028
MD5 9ebb19d1f0d23104f5aa8ff09da42f55
BLAKE2b-256 5c617eeece9a231994d7b7c5738084cdd0a101d92aad5568358128fd8cd15227

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page