Common domestic and foreign news website data collection framework
Project description
INSTRUCTION
本项目通过总结常见国内外新闻网站页面规则,汇总了一些通用的解析方法,在开发实践中效果较好,且用法简单
解析字段如下:
1.新闻列表页
- 新闻url
- 新闻标题
2.新闻内容提取
- 文章标题
- 文章发布时间
- 文章内容
- 文章主图片
- 文章图片
- 文章视频
- 网站名称
- 网站logo
- 网站域名
USAGE
安装项目:
pip install GeneralNewsScraper
本项目提供两种用法:
- url模式: 传参为url。需要安装playwright, 以及根据提示playwright install安装浏览器内核。通过浏览器下载完整html.
- html模式: 传参为url以及html。此时GNS将不做任何网络请求,url的作用仅做为网站logo以及媒体文件url拼接。
解析文章列表页
from GeneralNewsScraper import GNS
_html = """ html示例 """
# html非必传;pagination非必传
articles = GNS.article_list(url="https://www.voachinese.com/", html=_html, pagination=1)
print(articles)
解析文章详情页
from GeneralNewsScraper import GNS
_html = """ html示例 """
# html非必传
article_info = GNS.article(url="https://www.voachinese.com/a/exiled-chinese-businessman-guo-s-trial-nears-close/7693596.html", html=_html)
print(article_info)
效果演示
有问题请联系:jinchenghz@foxmail.com
免责声明:本项目仅供学习参考,请勿用于非法用途,否则后果自负。
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
generalnewsscraper-0.2.0.tar.gz
(22.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file generalnewsscraper-0.2.0.tar.gz.
File metadata
- Download URL: generalnewsscraper-0.2.0.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e59963dde17eecd4b990a0e2cfe53818c74e93b0cf7bfb78af042ac58d548e1f
|
|
| MD5 |
ce445fa51a35687414619d73b13303f2
|
|
| BLAKE2b-256 |
a8507171e41c927f43de8959e4a63f44cfa79d27d971b845b225949901f65043
|
File details
Details for the file generalnewsscraper-0.2.0-py3-none-any.whl.
File metadata
- Download URL: generalnewsscraper-0.2.0-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fa5450ef0fe2f8e92328887e6aa3797ec6925be2414998c45a6d7c456fc32ba
|
|
| MD5 |
83d03b9e4895995b90027ec1113badba
|
|
| BLAKE2b-256 |
1b0b267f3751c7c6e62f0c9ddfabeb6f2d246f7e5883d9d786230181d08715b3
|