Add your description here
Project description
DataHarvest
DataHarvest 是一个用于数据搜索、爬取、清洗的工具。
数据爬取&清洗
网站 | 内容 | url pattern | 爬取 | 清洗 |
---|---|---|---|---|
百度百科 | 词条 | baike.baidu.com/item | ✅ | ✅ |
知乎 | 文章 | zhuanlan.zhihu.com/p/ | ✅ | |
百度百家号 | 文章 | baijiahao.baidu.com/s | ✅ | ✅ |
360个人图书馆 | 文章 | www.360doc.com/content | ✅ | ✅ |
搜狗百科 | 词条 | baike.sogou.com/v | ✅ | ✅ |
搜狐 | 文章 | www.sohu.com/a | ✅ | ✅ |
网易 | 文章 | www.163.com/\w+/article/.+ | ✅ | ✅ |
安装与使用
pip install DataHarvest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dataharvest-0.1.6.tar.gz
(4.9 kB
view details)
Built Distribution
File details
Details for the file dataharvest-0.1.6.tar.gz
.
File metadata
- Download URL: dataharvest-0.1.6.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ecced0b1fb00c0ff9fe97d44bd09fc865fea363d2532d229de27c6eaa208355b |
|
MD5 | b6d9306cfd0e8d34ca51730c8faa3301 |
|
BLAKE2b-256 | b6f4f96be296c34b463f4fc2f16ecb96e5bdcbab7736b76d6e31cb6f8e466919 |
File details
Details for the file dataharvest-0.1.6-py3-none-any.whl
.
File metadata
- Download URL: dataharvest-0.1.6-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff1fc7b40efb1a258d7d0cba9b2fd03529da872ca744a1477413f6e152c9d955 |
|
MD5 | 8a29156458df5e070eb685395764490c |
|
BLAKE2b-256 | b515c3b1ec0ca293b53858ac90b8f267d4ee1d3c4a9102d658723a8e45d23771 |