Add your description here
Project description
DataHarvest
DataHarvest 是一个用于数据搜索、爬取、清洗的工具。
数据爬取&清洗
网站 | 内容 | url pattern | 爬取 | 清洗 |
---|---|---|---|---|
百度百科 | 词条 | baike.baidu.com/item | ✅ | ✅ |
百度百家号 | 文章 | baijiahao.baidu.com/s | ✅ | ✅ |
B站 | 文章 | www.bilibili.com/read | ✅ | ✅ |
腾讯网 | 文章 | new.qq.com/rain/a | ✅ | ✅ |
360个人图书馆 | 文章 | www.360doc.com/content | ✅ | ✅ |
360百科 | 词条 | baike.so.com/doc | ✅ | ✅ |
搜狗百科 | 词条 | baike.sogou.com/v | ✅ | ✅ |
搜狐 | 文章 | www.sohu.com/a | ✅ | ✅ |
头条 | 文章 | www.toutiao.com/article | ✅ | ✅ |
网易 | 文章 | www.163.com/\w+/article/.+ | ✅ | ✅ |
微信公众号 | 文章 | weixin.qq.com/s | ✅ | ✅ |
安装与使用
pip install DataHarvest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dataharvest-0.1.8.tar.gz
(7.8 kB
view details)
Built Distribution
File details
Details for the file dataharvest-0.1.8.tar.gz
.
File metadata
- Download URL: dataharvest-0.1.8.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0770e9b1d3f4733a9aaec896d99bef8aae75e5ed958ac86da05d70929cff015c |
|
MD5 | 6f56cd443ec378f450feb39644387ee5 |
|
BLAKE2b-256 | 1c06550b2a6fa544687728dcc7d46c56535cde0ea754399c366bd42c05926006 |
File details
Details for the file dataharvest-0.1.8-py3-none-any.whl
.
File metadata
- Download URL: dataharvest-0.1.8-py3-none-any.whl
- Upload date:
- Size: 14.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68dc030fed0639e28fd4af815d53c052ae469c967ad1ba8a80e213d0ad513cd2 |
|
MD5 | 8f4277bd19f9b2e32eff69f22fdf3642 |
|
BLAKE2b-256 | 43147e561ad13502d17df58ffae358360888c9d80706f0e29f113e6b05a7d7a8 |