light-weight high-level web-crawling framework
Project description
Spydy
spydy is a light-weight high-level web-crawling framework for fast-devlopment and high performance, which is inspired by unix pipeline.
Install
pip install spydy
How to use
There are two ways of running spydy:
- one way is to prepare a configuration file, and run spydy from cmd:
spydy myconfig.cfg
myconfig.cfg
may looks like below:
[Globals]
run_mode = async_forever
nworkers = 4
[PipeLine]
url = DummyUrls
request = AsyncHttpRequest
parser = DmozParser
log = MessageLog
store = CsvStore
[url]
url = https://dmoz-odp.org
repeat = 10
[store]
file_name = result.csv
- or run it from a python file(e.g.
spider.py
):
from spydy.engine import Engine
from spydy.utils import check_configs
from spydy import urls, request, parsers, logs, store
myconfig = {
"Globals":{
"run_mode": "async_forever",
"nworkers": "4"
},
"PipeLine":[urls.DummyUrls(url="https://dmoz-odp.org", repeat=10),
request.AsyncHttpRequest(), parsers.DmozParser(), logs.MessageLog(), store.CsvStore(file_name=FILE_NAME)]
}
chech_configs(myconfig)
spider = Engine.from_dict(myconfig)
spider.run()
then run it :
$ python spider.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spydy-0.1.14.tar.gz
(446.5 kB
view hashes)