biu · PyPI

A tiny web crawler framework.

These details have not been verified by PyPI

Project links

Homepage

Project description

# Biu
A tiny web crawler framework

## Features
* 请使用 Python3.5 或更高版本
* 并发基于 Gevent，因此你必须在脚本一开始`import biu`，或者自行 monkey patch
* 请求基于 Requests，请求与请求结果的参数与 Requests 基本兼容
* 页面解析基于 Parsel, 因此使用方法与 Scrapy 一致
* 基本是一个缩水版的 Scrapy，用法与之非常类似
* 更多高级功能请面向源代码编程，自行发掘

## Installation
```
pip install biu
```

## Example
```python
import biu ## Must be the first line, because of monkey-included.

class MySpider(biu.Project):
def start_requests(self):
for i in range(0, 301, 30):
# return 或者 yield 一个 biu.Request 就会去访问一个页面，参数与 requests 的那个基本上是兼容的
yield biu.Request(url="https://www.douban.com/group/explore/tech?start={}".format(i),
method="GET",
headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36"},
callback=self.parse)

def parse(self, resp):
## biu.Response 和 requests 的那个差不多，加了几个选择器上去
for item in resp.xpath('//*[@id="content"]/div/div[1]/div[1]/div'):
yield {
"title": item.xpath("div[2]/h3/a/text()").extract_first(),
"url": item.xpath("div[2]/h3/a/@href").extract_first(),
"abstract": item.css("p::text").extract_first()
}
# return 或者 yield 一个 dict, 就会当作结果传到result_handler里进行处理

def result_handler(self, rv):
print("get result:", rv)
# 在这把你的结果存了

biu.run(MySpider(concurrent=3, interval=0.2, max_retry=5))

```

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.0

Aug 31, 2024

0.2.5

Aug 18, 2020

0.2.4

Aug 18, 2020

0.2.3

Sep 19, 2018

0.2.2

Sep 13, 2018

0.2.1

Sep 10, 2018

0.2.0

Sep 10, 2018

0.1.7

Aug 8, 2018

0.1.6

Jul 19, 2018

0.1.5

Jul 11, 2018

0.1.3

Jun 6, 2018

This version

0.1.2

Apr 17, 2018

0.1.1

Apr 17, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Biu-0.1.2.tar.gz (5.0 kB view hashes)

Uploaded Apr 17, 2018 Source

Built Distribution

Biu-0.1.2-py2.py3-none-any.whl (4.2 kB view hashes)

Uploaded Apr 17, 2018 Python 2 Python 3

Hashes for Biu-0.1.2.tar.gz

Hashes for Biu-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`7558e85f0bde237b2701d83ae852b62e9b45ebe6de377c9bcfe442750a7bf40b`
MD5	`e1b916b7ffb60d16c6ec5bd063a25df1`
BLAKE2b-256	`562b722c6b53080ce3f99d8d43a1cc037496f0e9d6e24d8666ac07267ce1e585`

Hashes for Biu-0.1.2-py2.py3-none-any.whl

Hashes for Biu-0.1.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`e1d03c606544b5ae6000b2f097acb12a84c504d4f8ce17c4b76e416d99dd54a9`
MD5	`6527e43bdd23f43ca45ac391a9814f00`
BLAKE2b-256	`282a4ac05a22ff8b50b89632fcee1a4b60b9152fc9290cb96577b40ffe27997d`