light-weight high-level web-crawling framework
Project description
Spydy
spydy is a light-weight high-level web-crawling framework for fast-devlopment and high performance, which is inspired by unix pipeline.
Install
pip install spydy
How to use
There are two ways of running spydy:
- one way is to prepare a configuration file, and run spydy from cmd:
spydy myconfig.cfg
myconfig.cfg
may looks like below:
[Globals]
run_mode = async_forever
nworkers = 4
[PipeLine]
url = DummyUrls
request = AsyncHttpRequest
parser = DmozParser
log = MessageLog
store = CsvStore
[DummyUrls]
url = https://dmoz-odp.org
repeat = 10
[CsvStore]
file_name = result.csv
- or run it from a python file(e.g.
spider.py
):
from spydy.engine import Engine
from spydy.utils import check_configs
myconfig = {
"Globals":{
"run_mode": "async_forever",
"nworkers": "4"
},
"PipeLine":{
"url":"DummyUrls",
"request": "AsyncHttpRequest",
"parser": "DmozParser",
"log": "MessageLog"
"store": "CsvStore"
},
"DummyUrls":{
"url":"https://dmoz-odp.org",
"repeate":"10"
},
"CsvStore":{
"file_name":"result.csv"
}
}
chech_configs(myconfig)
spider = Engine(myconfig)
spider.run()
then run it :
$ python spider.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spydy-0.1.7.tar.gz
(158.8 kB
view hashes)