A Simple Distributed Web Crawle
Project description
simplified-scrapy
simplified scrapy, A Simple Web Crawle
Requirements
- Python 2.7, 3.0+
- Works on Linux, Windows, Mac OSX, BSD
run
from simplified_scrapy.simplified_main import SimplifiedMain
SimplifiedMain.startThread()
Demo
Custom crawler class needs to extend Spider class
from core.spider import Spider
class DemoSpider(Spider):
Here is an example of collecting data
from simplified_scrapy.spider import Spider, SimplifiedDoc
class DemoSpider(Spider):
name = 'demo-spider'
start_urls = ['http://www.scrapyd.cn/']
allowed_domains = ['www.scrapyd.cn']
def extract(self, url, html, models, modelNames):
doc = SimplifiedDoc(html)
lstA = doc.listA(url=url["url"])
return [{"Urls": lstA, "Data": None}]
from simplified_scrapy.simplified_main import SimplifiedMain
SimplifiedMain.startThread(DemoSpider())
pip install
pip install simplified-scrapy
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Close
Hashes for simplified_scrapy-0.6.71-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0ff2798e023ad0c2f9f1344d13a463cee272b39d574c97144275e8641ad9945 |
|
MD5 | c5a1f3c6da9f0bf75e4db0f7fd9abaed |
|
BLAKE2b-256 | 3f3532685e93b32f7bcfcb5fd80d906adbc7f3eade356c6c63aedec52b793f83 |