Skip to main content

General extractor of news pages.

Project description

GNE: 通用新闻网站正文抽取器

GeneralNewsExtractor(GNE)是一个通用新闻网站正文抽取模块,输入一篇新闻网页的 HTML, 输出正文内容、标题、作者、发布时间、正文中的图片地址和正文所在的标签源代码。GNE在提取今日头条、网易新闻、游民星空、 观察者网、凤凰网、腾讯新闻、ReadHub、新浪新闻等数百个中文新闻网站上效果非常出色,几乎能够达到100%的准确率。

使用方式也非常简单:

from gne import GeneralNewsExtractor

extractor = GeneralNewsExtractor()
html = '网站源代码'
result = extractor.extract(html)
print(result)

安装

pip install gne

文档

https://generalnewsextractor.readthedocs.io/

帮助 GNE 变得更好

https://github.com/kingname/GeneralNewsExtractor

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gne-0.4.3.tar.gz (32.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gne-0.4.3-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file gne-0.4.3.tar.gz.

File metadata

  • Download URL: gne-0.4.3.tar.gz
  • Upload date:
  • Size: 32.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for gne-0.4.3.tar.gz
Algorithm Hash digest
SHA256 26ed77fc2b96d5e9dd28b288ebfdeaf6bf7f034a92b8f75067c64eefffd0b4a9
MD5 b0d62367596d8dd1aa25f8c05fc31a7c
BLAKE2b-256 8e41d72fc42048fafcda9e5b88a3e1648cfeb0f1b59c813856b4d6422df70c90

See more details on using hashes here.

File details

Details for the file gne-0.4.3-py3-none-any.whl.

File metadata

  • Download URL: gne-0.4.3-py3-none-any.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for gne-0.4.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d4f53218ebc70fcc0e69f695894720d4f2c335ed1c77ad2bdf6c6cb0db4a7830
MD5 83f0776b42df6b97a1fd3cbba5aad77a
BLAKE2b-256 31f507f65c68fab99b22b5b948d2790e2fe0d7ff4f444fb650f4a14c75855b16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page