一款可以接入自定义扩展的爬虫
Project description
说明
- 一款可以接入自定义扩展的爬虫
示例
- 简单演示
from manc.plugins import UserAgentPlugin
from manc.spider import BaseSpider
url = 'https://blog.csdn.net/MarkAdc'
# 1. 基础爬虫
s1 = BaseSpider()
r1 = s1.goto(url) # 响应对象可以直接使用Xpath、CSS
print(type(r1))
print(r1.request.headers)
print(r1.xpath("//title/text()").get())
print()
# 2. 标准爬虫,等价于 基础爬虫 + ua插件
s2 = BaseSpider()
s2.add_plugins([UserAgentPlugin()])
r2 = s2.goto(url) # 请求带了UA
print(type(r2))
print(r2.request.headers)
print(r2.xpath("//title/text()").get())
print()
- 自定义扩展演示
from manc import Spider
from manc.plugins import SpiderPlugin
class ProxyPlugin(SpiderPlugin):
def deal_request(self, request):
proxy = 'http://127.0.0.1:1082'
request.proxies = {"http": proxy, "https": proxy}
request.name = "cMan"
def deal_response(self, response):
return response
s = Spider()
s.add_plugin(ProxyPlugin())
url = 'http://www.baidu.com'
r = s.goto(url)
print(type(r), type(r.request))
print(r.request.name)
print(r.request.headers)
print(r.request.proxies)
print(r.get_one("//title/text()"))
print(r.get_all("//title/text()"))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
manc-0.1.1.tar.gz
(3.5 kB
view details)
File details
Details for the file manc-0.1.1.tar.gz.
File metadata
- Download URL: manc-0.1.1.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
feebab8a3bd2c1ac874530cd8c7e7a850c8a58850cc4ff9fa077b2dea4b46f1b
|
|
| MD5 |
37812bff6c9167849354fa36448434cd
|
|
| BLAKE2b-256 |
57c7a41b30abeb502fff5ace29d75a5f1ee3ced26fe79052cc5408e3f7e0ebe0
|