Skip to main content

This is a library for web page parsing and web page requests

Project description

kingcow

这是一个用于网页请求、网页解析和爬虫的库

网页标签解析

kingcow提供了网页标签的解析,您只需提供标签的名称与属性,他就会返回属性的值

import kingcow as kc
a = kc.get('https://www.taobao.com/') # 返回一个kingcow.Request对象,使用GET发送请求
print('a.html_tag('a','href'))

这段代码会打印出淘宝网首页的所有链接,你可以根据这些链接访问淘宝网的其他站点

网页数据解析

kingcow提供网络数据解析,可以快速方便解析网页数据,您只需提供标签和属性(可选)

import kingcow as kc
a = kc.get('https://docs.python.org/zh-cn/3.8/tutorial/index.html')
a.html_analysis('p')

代码会打印出Python官方文档中为标签p的数据

还可以用json类方法获取返回json

网页爬取

kingcow提供了Request类以及get,post函数这些请求方法,这几种请求方法的参数大致相同,但get和post中请不要填method。

主要参数说明

url:str类型,为网页地址

data:请求数据,网页指定提交类型为application/x-www-form-urlencoded

json:请求类型,网页指定提交类型为application/json

headers:请求头

method:模式,如GET,POST

code:如果有data参数,会将data参数加密成code编码

ip:使用代理ip

Spider爬虫

Spider爬虫有两个类,分别为Spider和Spider_For_Threading,我们先来介绍Spider

Spider

Spider是一个爬取多个网站的爬虫类,所需的参数大致与Request类相同,不同的是Spider不需要提供url,只需提供url的集合(不是set,是list),还可以提供step,为爬取间隔。

Spider初始化后不会爬取网站,必须要用户使用Spider.request方法爬取

Spider还有四个类方法(Spider.request除外):

html_data:网页数据解析

html_tag:网页标签解析

xml_data:xml数据解析

json_get:json标签项提取

json:json提取

Spider_For_Threading

Spider_For_Threading是Spider的多线程版本,速度是Spider的四分之一,具体方法同Spider

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kingcow-1.0.1.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

kingcow-1.0.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file kingcow-1.0.1.tar.gz.

File metadata

  • Download URL: kingcow-1.0.1.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.6rc1

File hashes

Hashes for kingcow-1.0.1.tar.gz
Algorithm Hash digest
SHA256 71a616e46436ad1d1e029e4cb8ecf394666e8a4bcb89684e42a5d75e6aa884fb
MD5 eec5344f2d73d382b451039ca354abde
BLAKE2b-256 ad471745a485d701ccfcc06a0fc9c1abb31013b0e129cc896a7fb6e3a6d3e1f1

See more details on using hashes here.

File details

Details for the file kingcow-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: kingcow-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.6rc1

File hashes

Hashes for kingcow-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b7b956a8cba3fe870a272b211bd90d57fa0c05509a1842af18edc40d86985d50
MD5 6a4c248162f8cf0df68657dadada198f
BLAKE2b-256 d7cd241e729c29fef010b08a6ce17e138d7b7cf59eaea827c9394931fb5dddbc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page