ReSpider 是一款 整合了实用工具的 python 爬虫程序
Project description
RestSpider
开始一个爬虫
import ReSpider
class TestSpider(ReSpider.Spider):
# 自定义配置
__custom_setting__ = {}
start_urls = []
def start_requests(self):
pass
def parse(self, response):
pass
if __name__ == '__main__':
TemplateSpider().start()
信号 (待开发)
添加信号参数
- 解决任务完成时偶先程序无法正常停止的问题(浏览器渲染下浏览器无法关闭)
- 中间件和管道增加关闭标志(is_closed)
- 根据传递的信号参数对是否关闭标志进行赋值,根据标志来开关中间件或管道
ITEM
Item
一般的数据实体
from ReSpider import item
data = DataItem({'name': 'ReSpider'}, **kwargs)
xxListItem
数据实体列表, 可以传入一个list来构造
from ReSpider import item
data_list_item = DataListItem([1, 2, 3], **kwargs)
保存数据
# 二进制数据
io_item = item.IoItem(b'hello world', filename='hello world', filetype='bin')
# 文件类型, filetype 为文件类型
file_item = item.FileItem('hello world', filename='hello world', filetype='text')
# 表格数据
csv_item = item.CSVItem({'name': '张三', 'age': 14}, filename='hello world')
# 多行使用list
csv_list = item.CSVListItem([{'name': '张三', 'age': 14}, {'name': '李四', 'age': 19}], filename='法外狂徒')
# 使用yield来保存数据
data = item.DataItem()
yield data
Log
全局日志 (已完成)
- 日志写入.log文件
- 各个模块继承Logger类 (细节需要修改)
JS渲染
无头模式下cookie问题 (维普为例)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file ReSpider-1.0.0rc2-py3-none-any.whl
.
File metadata
- Download URL: ReSpider-1.0.0rc2-py3-none-any.whl
- Upload date:
- Size: 59.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.20.0 requests-toolbelt/0.9.1 urllib3/1.24.3 tqdm/4.54.1 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a2bd35da5e8c6718e06f2603fb929af0697a11fe8b980e918a283fbeef2ac05 |
|
MD5 | a3cb7d9bdec6929560dd751de799f9ef |
|
BLAKE2b-256 | 83023286a6eb05a3899db635c08867e871eb167844c10920f338c0de55a978f5 |