Inherit the requests module, add xpath functionality to expand the API, and handle request failures and retries

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PrSpiders线程池爬虫框架

PrSpiders安装

pip install PrSpiders

开始 Go start!

1.Demo

from PrSpider import PrSpiders  
  
  
class Spider(PrSpiders):  
    start_urls = 'https://www.runoob.com'  
  
    def parse(self, response):  
        # print(response.text)  
	    print(response, response.code, response.url)  
  #<Response Code=200 Len=323273> 200 https://www.runoob.com/
  
if __name__ == '__main__':  
    Spider()

2.重写入口函数-start_requests

start_requests是框架的启动入口，PrSpiders.Requests是发送请求的发送，参数下面会列举。

from PrSpider import PrSpiders  
  
  
class Spider(PrSpiders):  
  
    def start_requests(self, **kwargs):  
        start_urls = 'https://www.runoob.com'  
        PrSpiders.Requests(url=start_urls, callback=self.parse)  
  
    def parse(self, response):  
        # print(response.text)  
        print(response, response.code, response.url)  
  
  
if __name__ == '__main__':  
    Spider()

3.PrSpiders基本配置

底层使用ThreadPoolExecutor

workers: 线程数
retry: 是否开启请求失败重试，默认开启
download_delay: 请求周期
download_num: 每次线程请求数量，默认1秒5个请求

使用方法如下

from PrSpider import PrSpiders  
  
  
class Spider(PrSpiders):  
  workers = 5  
  retry = False  
  download_delay = 3  
  download_num = 10  
  
  def start_requests(self, **kwargs):  
        start_urls = 'https://www.runoob.com'  
        PrSpiders.Requests(url=start_urls, callback=self.parse)  
  
  def parse(self, response):  
        # print(response.text)  
        print(response, response.code, response.url)  

  
  
if __name__ == '__main__':  
    Spider()

4.PrSpiders.Requests基本配置

基本参数： url：请求网址 callback：回调函数 headers：请求头 retry_time：请求失败重试次数 method：请求方式（默认Get方法）， meta：回调参数传递 encoding：编码格式（默认utf-8） retry_interval：重试间隔 timeout：请求超时时间（默认10s） **kwargs：继承requests的参数如（data, params, proxies）

    PrSpiders.Requests(url=start_urls, headers={}, method='post', encoding='gbk', callback=self.parse,  
  retry_time=10, retry_interval=0.5, meta={'hhh': 'ggg'})

Api

GET Status Code

response.code

GET Text

response.text

GET Content

response.content

GET Url

response.url

GET History

response.history

GET Headers

response.headers

GET Text Length

response.len

GET Lxml Xpath

response.xpath

Xpath Api

text()方法:将xpath结果转成text
date()方法:将xpath结果转成date
get()方法:将xpath结果提取

getall()方法:将xpath结果全部提取，拥有text()方法和date()方法

from PrSpider import PrSpiders

class Spider(PrSpiders): def start_requests(self, **kwargs): start_urls = "https://www.runoob.com" PrSpiders.Requests(url=start_urls, callback=self.parse)

def parse(self, response):
    label = response.xpath("//div[@class='navto-nav']")
    label_text = response.xpath("//div[@class='navto-nav']").text()
    label_get = response.xpath("//div[@class='navto-nav']").get()
    label_getall = response.xpath("//div[@class='navto-nav']").getall()
    print(label)
    print(label_text)
    print(label_get)
    print(label_getall)

if name == "main": Spider()

Please contact me if there are any bugs

email -> 1944542244@qq.com

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.1.6

Mar 25, 2024

2.1.5

Mar 22, 2024

2.0.3

Mar 1, 2024

0.5.2.8

Aug 2, 2023

0.4.2

Apr 21, 2023

0.4.1

Apr 20, 2023

0.4.0

Apr 19, 2023

0.3.9

Apr 19, 2023

0.3.8

Apr 18, 2023

0.3.7

Apr 18, 2023

0.3.6

Apr 18, 2023

0.3.5

Apr 18, 2023

0.3.4

Apr 18, 2023

0.3.3

Apr 18, 2023

This version

0.3.2

Apr 18, 2023

0.3.1

Apr 17, 2023

0.3.0

Apr 17, 2023

0.2.9

Apr 17, 2023

0.2.8

Apr 12, 2023

0.2.7

Apr 10, 2023

0.2.6

Apr 7, 2023

0.2.5

Apr 6, 2023

0.2.4

Mar 31, 2023

0.2.3

Mar 31, 2023

0.2.2

Mar 31, 2023

0.2.1

Mar 31, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PrSpiders-0.3.2.tar.gz (13.2 kB view hashes)

Uploaded Apr 18, 2023 Source

Built Distribution

PrSpiders-0.3.2-py3-none-any.whl (14.0 kB view hashes)

Uploaded Apr 18, 2023 Python 3

Hashes for PrSpiders-0.3.2.tar.gz

Hashes for PrSpiders-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`48961851af9b629a089e44ccdc09ce0fadf5283b12f1f7e936d4856dda16d22b`
MD5	`eb08312cd1bca9ccd5ea4308d0f7ab73`
BLAKE2b-256	`0f84d4c5bd038ce72897ba9d2b3fe1530fdd21b7094b4697ecc64217934c6cf2`

Hashes for PrSpiders-0.3.2-py3-none-any.whl

Hashes for PrSpiders-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d1e07a66383fc72839971adc4441ecc14608f69a8b6896377af8212d1bbe3c1`
MD5	`d79b3c35e652706f838c9e0e4a01eead`
BLAKE2b-256	`499b7c3918c4bfcc44a1f4ae2785b69c7c5736df782e0981a213695193ebf2e1`