Skip to main content

ReSpider 是一款 整合了实用工具的 python 爬虫程序

Project description

RestSpider

开始一个爬虫

import ReSpider

class TestSpider(ReSpider.Spider):
    # 自定义配置
    __custom_setting__ = {}
    start_urls = []

    def start_requests(self):
        pass

    def parse(self, response):
        pass


if __name__ == '__main__':
    TemplateSpider().start()    

信号 (待开发)

添加信号参数

  • 解决任务完成时偶先程序无法正常停止的问题(浏览器渲染下浏览器无法关闭)
  • 中间件和管道增加关闭标志(is_closed)
  • 根据传递的信号参数对是否关闭标志进行赋值,根据标志来开关中间件或管道

ITEM

Item

一般的数据实体

from ReSpider import item
data = DataItem({'name': 'ReSpider'}, **kwargs)

xxListItem

数据实体列表, 可以传入一个list来构造

from ReSpider import item
data_list_item = DataListItem([1, 2, 3], **kwargs)

保存数据

# 二进制数据
io_item = item.IoItem(b'hello world', filename='hello world', filetype='bin')

# 文件类型, filetype 为文件类型
file_item = item.FileItem('hello world', filename='hello world', filetype='text')

# 表格数据
csv_item = item.CSVItem({'name': '张三', 'age': 14}, filename='hello world')

# 多行使用list
csv_list = item.CSVListItem([{'name': '张三', 'age': 14}, {'name': '李四', 'age': 19}], filename='法外狂徒')

# 使用yield来保存数据
data = item.DataItem()
yield data

Log

全局日志 (已完成)

  • 日志写入.log文件
  • 各个模块继承Logger类 (细节需要修改)

JS渲染

无头模式下cookie问题 (维普为例)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ReSpider-1.0.0rc2-py3-none-any.whl (59.8 kB view details)

Uploaded Python 3

File details

Details for the file ReSpider-1.0.0rc2-py3-none-any.whl.

File metadata

  • Download URL: ReSpider-1.0.0rc2-py3-none-any.whl
  • Upload date:
  • Size: 59.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.20.0 requests-toolbelt/0.9.1 urllib3/1.24.3 tqdm/4.54.1 importlib-metadata/4.11.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.5

File hashes

Hashes for ReSpider-1.0.0rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 5a2bd35da5e8c6718e06f2603fb929af0697a11fe8b980e918a283fbeef2ac05
MD5 a3cb7d9bdec6929560dd751de799f9ef
BLAKE2b-256 83023286a6eb05a3899db635c08867e871eb167844c10920f338c0de55a978f5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page