一个功能强大的 Python 爬虫工具包,提供加密解密、数据存储、异步下载和字体解析等功能
Project description
SpiderKit
一个功能强大的 Python 爬虫工具包,提供加密解密、数据存储、异步下载和字体解析等功能
特性
- 加密解密模块: 支持 RSA、AES、DES、3DES 等多种加密算法
- 数据存储模块: 支持 CSV、JSON、JSONL 格式的数据保存
- 异步下载器: 高性能异步文件下载,支持 M3U8 视频下载
- 字体解析模块: 自动解析反爬虫字体文件,生成字符映射
- 通用工具模块: 提供常用的哈希函数和工具方法
安装
pip install spiderkit
快速开始
加密解密
import os
from spiderkit.crypto import generate_rsa_keypair
from spiderkit.crypto import rsa_encrypt, rsa_decrypt, aes_encrypt, aes_decrypt
plaintext = "Hello Dawn!"
# RSA 加密解密
public_key, private_key = generate_rsa_keypair()
rsa_encrypted = rsa_encrypt(plaintext, public_key, "OAEP")
print(rsa_encrypted)
rsa_decrypted = rsa_decrypt(rsa_encrypted, private_key, "OAEP")
print(rsa_decrypted)
# AES 加密解密
aes_key = os.urandom(32)
aes_iv = os.urandom(16)
aes_encrypted = aes_encrypt(plaintext, aes_key, "CBC", iv=aes_iv)
print(aes_encrypted)
aes_decrypted = aes_decrypt(aes_encrypted, aes_key, "CBC", iv=aes_iv)
print(aes_decrypted)
异步下载
import asyncio
from spiderkit.downloader import Downloader, M3U8Downloader
# 可选请求头(部分网站加了防盗链需要 Referer 字段)
headers = {
"Referer": "https://www.example.com/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.98 Safari/537.36"
}
# 普通文件下载
downloader = Downloader(headers=headers)
file_mapping = {
"images/image1.jpg": "https://example.com/image1.jpg",
"images/image2.jpg": "https://example.com/image2.jpg"
}
downloader.download_files(file_mapping)
# M3U8 视频下载
m3u8_downloader = M3U8Downloader(headers=headers)
m3u8_downloader.download_video("https://example.com/video.m3u8", "output_video.mp4")
字体解析
from spiderkit.font import parse_font_url, decrypt_text_with_font_maps
# 解析字体文件路径或URL
# font_maps = parse_font_url("fonts/font.woff")
font_maps = parse_font_url("https://example.com/font.woff")
# 解密文本
encrypted_text = "加密的文本"
decrypted_text = decrypt_text_with_font_maps(encrypted_text, font_maps)
print(decrypted_text)
哈希计算
from spiderkit.utils.hash_utils import md5, sha1, sha256, sha512, sha3_256, blake2b
text = "Hello Dawn!"
# 默认输出 hex
print(md5(text))
print(sha1(text))
print(sha256(text))
print(sha512(text))
# 其他算法
print(sha3_256(text))
print(blake2b(text))
# 其他输出格式: binary / base64
print(md5(text, "binary"))
print(md5(text, "base64"))
数据存储
from spiderkit.storage import save_data_to_file
data = [
{"name": "张三", "age": 25},
{"name": "李四", "age": 30}
]
# 保存为 CSV
save_data_to_file(data, "users", "csv")
# 保存为 JSON
save_data_to_file(data, "users", "json")
# 保存为 JSONL
save_data_to_file(data, "users", "jsonl")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spiderkit-0.1.0.tar.gz
(105.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
spiderkit-0.1.0-py3-none-any.whl
(17.5 kB
view details)
File details
Details for the file spiderkit-0.1.0.tar.gz.
File metadata
- Download URL: spiderkit-0.1.0.tar.gz
- Upload date:
- Size: 105.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd3163cdc45f105698d00252388f538a1c7c2d706de6815e919be54fa9cd9595
|
|
| MD5 |
c881084a8d66a37b2f00e8eeb6af6be3
|
|
| BLAKE2b-256 |
7ba6a069bb7c97e29bbcc04c5f3e89cb3aaa5892e55f460c5bef850f35ff6b5d
|
File details
Details for the file spiderkit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: spiderkit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96b134fc9f5792d53f6c47244e7486efcb602ed3d7d9f9783667bddb797b0be2
|
|
| MD5 |
740e75f1cd0812833402b41e8f80b752
|
|
| BLAKE2b-256 |
87ca323eb419a5c276d82cf76b460b3df86d4583f88c82eab88619aeaa4e425f
|