Skip to main content

一款可以接入自定义扩展的爬虫

Project description

说明

  • 一款可以接入自定义扩展的爬虫

示例

  • 简单演示
from manc.plugins import UserAgentPlugin
from manc.spider import BaseSpider

url = 'https://blog.csdn.net/MarkAdc'

# 1. 基础爬虫
s1 = BaseSpider()
r1 = s1.goto(url)  # 响应对象可以直接使用Xpath、CSS
print(type(r1))
print(r1.request.headers)
print(r1.xpath("//title/text()").get())
print()

# 2. 标准爬虫,等价于 基础爬虫 + ua插件
s2 = BaseSpider()
s2.add_plugins([UserAgentPlugin()])
r2 = s2.goto(url)  # 请求带了UA
print(type(r2))
print(r2.request.headers)
print(r2.xpath("//title/text()").get())
print()
  • 自定义扩展演示
from manc import Spider
from manc.plugins import SpiderPlugin


class ProxyPlugin(SpiderPlugin):
    def deal_request(self, request):
        proxy = 'http://127.0.0.1:1082'
        request.proxies = {"http": proxy, "https": proxy}
        request.name = "cMan"

    def deal_response(self, response):
        return response


s = Spider()
s.add_plugin(ProxyPlugin())

url = 'http://www.baidu.com'
r = s.goto(url)
print(type(r), type(r.request))
print(r.request.name)
print(r.request.headers)
print(r.request.proxies)
print(r.get_one("//title/text()"))
print(r.get_all("//title/text()"))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manc-0.1.1.tar.gz (3.5 kB view details)

Uploaded Source

File details

Details for the file manc-0.1.1.tar.gz.

File metadata

  • Download URL: manc-0.1.1.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for manc-0.1.1.tar.gz
Algorithm Hash digest
SHA256 feebab8a3bd2c1ac874530cd8c7e7a850c8a58850cc4ff9fa077b2dea4b46f1b
MD5 37812bff6c9167849354fa36448434cd
BLAKE2b-256 57c7a41b30abeb502fff5ace29d75a5f1ee3ced26fe79052cc5408e3f7e0ebe0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page