An easy to use tools module for writing multi-thread and multi-process programs.
Project description
QSpider
An easy to use tools module for writing multi-thread and multi-process programs.
Install
QSpider could be easily installed using pip:
$ pip install qspider
Usages
Using Module
# 1. import class QSpider and Task from qspider module
# and other modules.
from qspider import QSpider, Task
import requests
# 2. Define a list of task source.
# Each of the element in this source list is called 'task_source'.
# 'task_source' could be any type, ie str, tuple, object, dict...,
# it could also be requests.Session or something else.
source = ['https://www.baidu.com' for i in range(100)]
# 3. Create your own task (which need to extends Task).
class TestTask(Task):
"""A test task
Attributes:
task_source: the source which needed in the task.
which is actually the 'task_source' in the source list.
"""
def __init__(self, task_source):
Task.__init__(self, task_source)
def run(self):
# process the self.task_source here.
res = requests.get(self.task_source, timeout=3)
# return values needed
return res.status_code
# 4. Create the QSpider and run it.
test_spider = QSpider(source, TestTask, has_result=True)
results = test_spider.run()
print(results)
Run the script and you'll get:
[Info] 100 tasks in total.
[Input] Number of threads: 20
[ ✔ ] 100% |███████████████████████████████████| 100/100 [eta-0:00:00, 2.5s, 40.8it/s]
[200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, ... , 200]
Using command line
Create a QSpider using command:
$ genqspider -h
usage: Generate your qspider based on templates [-h] [-p] name
positional arguments:
name Your spider name
optional arguments:
-h, --help show this help message and exit
-p, --process Using multi-process instead of multi-thread template
Example
-
Create a
test
crawler using QSpider.$ genqspider test A qspider named test is initialized.
A python script named
test.py
is created in your current directory. -
Open the
test.py
,And you'll get:# -*- coding: utf-8 -*- from qspider import ThreadManager, Task class TestSpider(ThreadManager): def __init__(self, has_result=False, add_failed=True): self.name = "test" self.has_result = has_result self.add_failed = add_failed self.source = [0] # define your source list super(TestSpider, self).__init__(self.source, self.QTask, has_result=self.has_result, add_failed=self.add_failed) class QTask(Task): def __init__(self, task_source): Task.__init__(self, task_source) def run(self): # parse single task source pass if __name__=="__main__": qspider = TestSpider() qspider.test() # qspider.run()
-
Modify your source list with the line
self.source = [0]
, and how you gonna process thetask_source
in the methodQTask.run
.# -*- coding: utf-8 -*- import requests from qspider.core import QSpider, Task class TestSpider(QSpider): def __init__(self, has_result=False, add_failed=True): self.name = "test" self.has_result = has_result self.add_failed = add_failed # 1. define your source list self.source = ['https://www.baidu.com' for i in range(100)] super(TestSpider, self).__init__(self.source, self.QTask, has_result=self.has_result, add_failed=self.add_failed) class QTask(Task): def __init__(self, task_source): Task.__init__(self, task_source) # 2. Modify the run method def run(self): # process the self.task_source here. res = requests.get(self.task_source, timeout=3) # return values needed return res.status_code if __name__=="__main__": # 3. 'has_result' is True when there are values returned in QTask.run method. qspider = TestSpider(has_result=True) # 4. receive the results after run the qspider. results = qspider.run() print(results)
-
Run the script and you'll get:
[Info] 100 tasks in total. [Input] Number of threads: 20 [ ✔ ] 100% |███████████████████████████████████| 100/100 [eta-0:00:00, 2.5s, 40.8it/s] [200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, ... , 200]
Releases
- v0.1.1: First release with basic classes.
- v0.1.2: Reconstruct code, add ThreadManager, ProcessManager and other tool classes.
License
Copyright (c) 2020 tishacy.
Licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
qspider-0.1.2.tar.gz
(9.9 kB
view hashes)
Built Distribution
qspider-0.1.2-py3-none-any.whl
(11.6 kB
view hashes)