Skip to main content

一个功能强大的 Python 爬虫工具包,提供加密解密、数据存储、异步下载和字体解析等功能

Project description

SpiderKit

一个功能强大的 Python 爬虫工具包,提供加密解密、数据存储、异步下载和字体解析等功能

特性

  • 加密解密模块: 支持 RSA、AES、DES、3DES 等多种加密算法
  • 数据存储模块: 支持 CSV、JSON、JSONL 格式的数据保存
  • 异步下载器: 高性能异步文件下载,支持 M3U8 视频下载
  • 字体解析模块: 自动解析反爬虫字体文件,生成字符映射
  • 通用工具模块: 提供常用的哈希函数和工具方法

安装

pip install spiderkit

快速开始

加密解密

import os

from spiderkit.crypto import generate_rsa_keypair
from spiderkit.crypto import rsa_encrypt, rsa_decrypt, aes_encrypt, aes_decrypt

plaintext = "Hello Dawn!"

# RSA 加密解密
public_key, private_key = generate_rsa_keypair()
rsa_encrypted = rsa_encrypt(plaintext, public_key, "OAEP")
print(rsa_encrypted)
rsa_decrypted = rsa_decrypt(rsa_encrypted, private_key, "OAEP")
print(rsa_decrypted)

# AES 加密解密
aes_key = os.urandom(32)
aes_iv = os.urandom(16)
aes_encrypted = aes_encrypt(plaintext, aes_key, "CBC", iv=aes_iv)
print(aes_encrypted)
aes_decrypted = aes_decrypt(aes_encrypted, aes_key, "CBC", iv=aes_iv)
print(aes_decrypted)

异步下载

import asyncio
from spiderkit.downloader import Downloader, M3U8Downloader

# 可选请求头(部分网站加了防盗链需要 Referer 字段)
headers = {
    "Referer": "https://www.example.com/",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.98 Safari/537.36"
}

# 普通文件下载
downloader = Downloader(headers=headers)
file_mapping = {
    "images/image1.jpg": "https://example.com/image1.jpg",
    "images/image2.jpg": "https://example.com/image2.jpg"
}
downloader.download_files(file_mapping)

# M3U8 视频下载
m3u8_downloader = M3U8Downloader(headers=headers)
m3u8_downloader.download_video("https://example.com/video.m3u8", "output_video.mp4")

字体解析

from spiderkit.font import parse_font_url, decrypt_text_with_font_maps

# 解析字体文件路径或URL
# font_maps = parse_font_url("fonts/font.woff")
font_maps = parse_font_url("https://example.com/font.woff")

# 解密文本
encrypted_text = "加密的文本"
decrypted_text = decrypt_text_with_font_maps(encrypted_text, font_maps)
print(decrypted_text)

哈希计算

from spiderkit.utils.hash_utils import md5, sha1, sha256, sha512, sha3_256, blake2b

text = "Hello Dawn!"

# 默认输出 hex
print(md5(text))
print(sha1(text))
print(sha256(text))
print(sha512(text))

# 其他算法
print(sha3_256(text))
print(blake2b(text))

# 其他输出格式: binary / base64
print(md5(text, "binary"))
print(md5(text, "base64"))

数据存储

from spiderkit.storage import save_data_to_file

data = [
    {"name": "张三", "age": 25},
    {"name": "李四", "age": 30}
]

# 保存为 CSV
save_data_to_file(data, "users", "csv")

# 保存为 JSON
save_data_to_file(data, "users", "json")

# 保存为 JSONL
save_data_to_file(data, "users", "jsonl")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spiderkit-0.1.0.tar.gz (105.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spiderkit-0.1.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file spiderkit-0.1.0.tar.gz.

File metadata

  • Download URL: spiderkit-0.1.0.tar.gz
  • Upload date:
  • Size: 105.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.6

File hashes

Hashes for spiderkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fd3163cdc45f105698d00252388f538a1c7c2d706de6815e919be54fa9cd9595
MD5 c881084a8d66a37b2f00e8eeb6af6be3
BLAKE2b-256 7ba6a069bb7c97e29bbcc04c5f3e89cb3aaa5892e55f460c5bef850f35ff6b5d

See more details on using hashes here.

File details

Details for the file spiderkit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: spiderkit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.6

File hashes

Hashes for spiderkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 96b134fc9f5792d53f6c47244e7486efcb602ed3d7d9f9783667bddb797b0be2
MD5 740e75f1cd0812833402b41e8f80b752
BLAKE2b-256 87ca323eb419a5c276d82cf76b460b3df86d4583f88c82eab88619aeaa4e425f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page