this is a small spider,you can easy running. When you often need to crawl a single site, you don't have to redo some repeated code every time, using this small framework you can quickly crawl data into a file or database.
Project description
lrabbit_scrapy
this is a small spider,you can easy running. When you often need to crawl a single site, you don't have to redo some repeated code every time, using this small framework you can quickly crawl data into a file or database.
Installing
$ pip3 install lrabbit_scrapy
A Simple Example
class Spider(LrabbitSpider):
"""
# spider_name
redis_key:
list:spider_name 任务队列
success:count:spider_name 记录成功数
list:error:excepiton404
"""
spider_name = "test"
# 最大线程数
max_thread_num = 10
# 重置任务队列
reset_task_config = False
# 开启循环模式
loop_task_config = False
# 去除确认信息
remove_confirm_config = False
def __init__(self):
super().__init__()
self.session = RequestSession()
self.proxy_session = RequestSession(proxies=None)
def worker(self, task):
LogUtils.log_info(task)
self.session.send_request(method='GET', url="https://www.lrabbit.life/233333333333333333/")
# when you keyboraderror you can't lost you task
self.task_list.remove(task)
# update stat
self.update_stat_redis()
LogUtils.log_finish(task)
def init_task_list(self):
res = self.mysql_client.query("select id from rookie limit 100 ")
return [item['id'] for item in res]
if __name__ == '__main__':
spider = Spider()
spider.run()
Links
- author: https://www.lrabbit.life/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
lrabbit_scrapy-2.0.0.tar.gz
(16.6 kB
view hashes)
Built Distribution
Close
Hashes for lrabbit_scrapy-2.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4c99ecbf131bf7d08fd8b3364fc0c9f19ce8dea897847fd037515e130ae2dc1 |
|
MD5 | c738a39b0412078e3ce219fa340129f0 |
|
BLAKE2b-256 | 8e8933b4abcd1d43168a50ae89e02a666c498132dca8f403dd1bfe898a3540b8 |