A web spider framework based on gevent and requests
Project description
A web spider framework based on gevent and requests.
Spider Example
以下是我们的一个爬虫类示例,其作用为爬取 百度新闻 的热点要闻:
from gspider import Spider, HttpRequest, run_spider, Selector
class BaiduNewsSpider(Spider):
def start_requests(self):
yield HttpRequest("http://news.baidu.com/")
def parse(self, response):
selector = Selector(response.text)
hot = selector.css("div.hotnews a").text
self.log("Hot News:")
for i in range(len(hot)):
self.log("%s: %s", i + 1, hot[i])
if __name__ == '__main__':
run_spider(BaiduNewsSpider)
在爬虫类中我们定义了一些方法:
start_requests: 返回爬虫初始请求。
parse: 处理请求得到的页面,这里借助 Selector 及CSS Selector语法提取到了我们所需的数据。
Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gspider-0.1.1.tar.gz
(26.5 kB
view details)
File details
Details for the file gspider-0.1.1.tar.gz
.
File metadata
- Download URL: gspider-0.1.1.tar.gz
- Upload date:
- Size: 26.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
da0be69cf6759a404e3b0d79acac52615113776ac550a3dbc0a1b6f13fb901eb
|
|
MD5 |
472bdf84acfcd8b5084ee00477738157
|
|
BLAKE2b-256 |
abfe5d13fbd53689938baafdeabc2f0f4c6eca2ed1714a54dc82d0158beaf9d6
|