Skip to main content

A Simple Distributed Web Crawle

Project description

simplified-scrapy

simplified scrapy, A Simple Web Crawle

Requirements

  • Python 2.7, 3.0+
  • Works on Linux, Windows, Mac OSX, BSD

运行

进入项目根目录,执行下面命令
python start.py

Demo

项目中爬虫例子,在文件夹spiders下,文件名为demoSpider.py。自定义的爬虫类需要继承Spider类

from core.spider import Spider 
class DemoSpider(Spider):

需要给爬虫定义一个名字,配置入口链接地址,与抽取数据用到的解析方法。下面是采集数据的一个例子。

from simplified_scrapy.core.spider import Spider 
from simplified_scrapy.simplified_doc import SimplifiedDoc
class DemoSpider(Spider):
  name = 'demo-spider'
  start_urls = ['http://www.scrapyd.cn/']
  allowed_domains = ['www.scrapyd.cn']
  def extract(self, url, html, models, modelNames):
    doc = SimplifiedDoc(html)
    lstA = doc.listA(url=url["url"])
    return [{"Urls": lstA, "Data": None}]

from simplified_scrapy.simplified_main import SimplifiedMain
SimplifiedMain.startThread(DemoSpider())

pip安装

pip install simplified-scrapy

Examples

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for simplified-scrapy, version 0.5.61
Filename, size File type Python version Upload date Hashes
Filename, size simplified_scrapy-0.5.61-py2.py3-none-any.whl (89.3 kB) File type Wheel Python version py2.py3 Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page