a simple spider framework, and some tools for spider
Project description
spider
一个简单的爬虫框架, 性能什么的可能不行,但应该蛮简单。
可以直接运行demo_spider查看效果
贴出部分demo代码
定义一个OriginSqlItem便于数据入库
class DemoItem(OriginSqlItem):
title = Field('nvarchar(100)')
tag = Field('nvarchar(20)')
url = Field('nchar(500)')
定义一个Recorder规定数据保存方式
class DemoRecorder(BaseRecorder):
table = 'demo_table'
def __init__(self):
super().__init__()
self.sql = Sqliter(**SQL)
DemoItem.create_item(self.sql, self.table, sqltype='sqlite')
def record(self, item):
return DemoItem.save(self.sql, self.table, item)
""" 或者直接使用SqlRecorder入库 class DemoRecorder(SqlRecorder): sql = Sqliter(**SQL) model = DemoItem table = 'demo' """
与Scrapy的部分操作类似
class DemoSpider(ConeSpider):
"""
如果遇到反爬可以自定义downloader
"""
recorders = [DemoRecorder]
base_url = 'http://www.chinanews.com/scroll-news/news%d.html'
start_urls = ['http://www.chinanews.com/scroll-news/news1.html']
def parse(self, response):
pass
"""详见demo_spider"""
spider = DemoSpider() spider.start() 即可运行爬虫,Ctrl + C 暂停,连按关闭爬虫
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
py-cone-0.0.1.tar.gz
(40.6 kB
view hashes)
Built Distribution
py_cone-0.0.1-py3-none-any.whl
(56.0 kB
view hashes)