This is a library for web page parsing and web page requests

Project description

kingcow

这是一个用于网页请求、网页解析和爬虫的库

网页标签解析

kingcow提供了网页标签的解析，您只需提供标签的名称与属性，他就会返回属性的值

import kingcow as kc
a = kc.get('https://www.taobao.com/') # 返回一个kingcow.Request对象，使用GET发送请求
print('a.html_tag('a','href'))

这段代码会打印出淘宝网首页的所有链接，你可以根据这些链接访问淘宝网的其他站点

网页数据解析

kingcow提供网络数据解析，可以快速方便解析网页数据，您只需提供标签和属性(可选)

import kingcow as kc
a = kc.get('https://docs.python.org/zh-cn/3.8/tutorial/index.html')
a.html_analysis('p')

代码会打印出Python官方文档中为标签p的数据

还可以用json类方法获取返回json

网页爬取

kingcow提供了Request类以及get,post函数这些请求方法，这几种请求方法的参数大致相同，但get和post中请不要填method。

主要参数说明

url:str类型，为网页地址

data:请求数据，网页指定提交类型为application/x-www-form-urlencoded

json:请求类型，网页指定提交类型为application/json

headers:请求头

method:模式，如GET,POST

code:如果有data参数，会将data参数加密成code编码

ip:使用代理ip

Spider爬虫

Spider爬虫有两个类，分别为Spider和Spider_For_Threading，我们先来介绍Spider

Spider

Spider是一个爬取多个网站的爬虫类，所需的参数大致与Request类相同，不同的是Spider不需要提供url，只需提供url的集合(不是set,是list)，还可以提供step，为爬取间隔。

Spider初始化后不会爬取网站，必须要用户使用Spider.request方法爬取

Spider还有四个类方法（Spider.request除外）:

html_data:网页数据解析

html_tag:网页标签解析

xml_data:xml数据解析

json_get:json标签项提取

json:json提取

Spider_For_Threading

Spider_For_Threading是Spider的多线程版本，速度是Spider的四分之一，具体方法同Spider

Project details

Release history Release notifications | RSS feed

This version

1.0.1

Aug 15, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kingcow-1.0.1.tar.gz (5.5 kB view details)

Uploaded Aug 15, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kingcow-1.0.1-py3-none-any.whl (7.3 kB view details)

Uploaded Aug 15, 2021 Python 3

File details

Details for the file kingcow-1.0.1.tar.gz.

File metadata

Download URL: kingcow-1.0.1.tar.gz
Upload date: Aug 15, 2021
Size: 5.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.6rc1

File hashes

Hashes for kingcow-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`71a616e46436ad1d1e029e4cb8ecf394666e8a4bcb89684e42a5d75e6aa884fb`
MD5	`eec5344f2d73d382b451039ca354abde`
BLAKE2b-256	`ad471745a485d701ccfcc06a0fc9c1abb31013b0e129cc896a7fb6e3a6d3e1f1`

See more details on using hashes here.

File details

Details for the file kingcow-1.0.1-py3-none-any.whl.

File metadata

Download URL: kingcow-1.0.1-py3-none-any.whl
Upload date: Aug 15, 2021
Size: 7.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.6rc1

File hashes

Hashes for kingcow-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b7b956a8cba3fe870a272b211bd90d57fa0c05509a1842af18edc40d86985d50`
MD5	`6a4c248162f8cf0df68657dadada198f`
BLAKE2b-256	`d7cd241e729c29fef010b08a6ce17e138d7b7cf59eaea827c9394931fb5dddbc`

See more details on using hashes here.

kingcow 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

kingcow

这是一个用于网页请求、网页解析和爬虫的库

网页标签解析

网页数据解析

网页爬取

主要参数说明

Spider爬虫

Spider

Spider_For_Threading

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes