light-weight high-level web-crawling framework
Project description
Spydy
spydy is a light-weight high-level web-crawling framework for fast-devlopment and high performance, which is inspired by unix pipeline.
Install
pip install spydy
How to use
There are two ways of running spydy:
- one way is to prepare a configuration file, and run spydy from cmd:
spydy myconfig.cfg
myconfig.cfg
may looks like below:
[Globals]
run_mode = async_forever
nworkers = 4
[PipeLine]
url = DummyUrls
request = AsyncHttpRequest
parser = DmozParser
log = MessageLog
store = CsvStore
[url]
url = https://dmoz-odp.org
repeat = 10
[store]
file_name = result.csv
- or run it from a python file(e.g.
spider.py
):
from spydy.engine import Engine
from spydy.utils import check_configs
from spydy import urls, request, parsers, logs, store
myconfig = {
"Globals":{
"run_mode": "async_forever",
"nworkers": "4"
},
"PipeLine":[urls.DummyUrls(url="https://dmoz-odp.org", repeat=10),
request.AsyncHttpRequest(), parsers.DmozParser(), logs.MessageLog(), store.CsvStore(file_name=FILE_NAME)]
}
check_configs(myconfig)
spider = Engine.from_dict(myconfig)
spider.run()
then run it :
$ python spider.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spydy-0.1.25.tar.gz
(16.6 kB
view details)
File details
Details for the file spydy-0.1.25.tar.gz
.
File metadata
- Download URL: spydy-0.1.25.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bee78e8e9a278b758961f87c463869b6e7fafc76dbc865a7bf4b3f7bde45edf7 |
|
MD5 | 1858486eac5219720a87d8677a0434b3 |
|
BLAKE2b-256 | 0b7577f9eb441d24d345511f500e14e7cd01e99ac8b00380e4e8d8422ab4ecd6 |