kkrobots安全爬虫守护者
Project description
关于这个项目
kkrobots 是一款保证安全爬虫的工具,在爬取任何请求前调用 Parse 对象的 can_crawl() 方法即可判断是否符合 robots.txt 协议。
使用流程
使用流程非常简单,在每次爬虫前调用即可:
from kkrobots import Parse
if __name__ == '__main__':
parse = Parse(
user_agent='your spider',
# 该站点任意链接即可
test_url='https://xxxx.com/xxx/xxx/xxx'
)
can_crawl = parse.can_crawl('https://xxxx.com/xxx/xxx')
# 下方执行你的爬虫逻辑
if can_crawl:
pass
关于作者
微信公众号:Python卡皮巴拉
🌟【Python卡皮巴拉】—— 你的Python修炼秘籍,代码界的“神兽”驾到!🌟
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
kkrobots-1.0.1.tar.gz
(4.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kkrobots-1.0.1.tar.gz.
File metadata
- Download URL: kkrobots-1.0.1.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7afddd96714cec3c18a88749ff927ebc8a5f7eb8bb8b4f29168614374379b3b4
|
|
| MD5 |
a81407f5ca9cec1105c6b73b6d30673d
|
|
| BLAKE2b-256 |
dbea56d7df88019a6fd22b5643305ee3565c4517874b33a69b3f27fc0d1fc545
|
File details
Details for the file kkrobots-1.0.1-py3-none-any.whl.
File metadata
- Download URL: kkrobots-1.0.1-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f7c59436f14e3800989796816d7e636eaa1d6c998afcb05298f47014e6657fb
|
|
| MD5 |
aa3de55a83a155a0bd03d01cbaa4133e
|
|
| BLAKE2b-256 |
25c8460d0730f334085c98a239550b6268dcd9f68bd1ffb1e3de55e0cec69e8e
|