easier for you to use internet spider

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

SimpleSpider Instruction

how to install

pip install SimpleSpider

This is a module to help you use network spider easier.

How to install

pip install SimpleSpider

Using in command

There are 9 argument when you use in the command.

argument	type	default	desctipyion
url	str	None	Your url
single	bool	True	If you want to use script to get the content from series of page,you can set it as False and se the index.
re	str	None	Regular Expression setting use,dont forget to use "" ,eg: --re "ab*c"
xpath	str	None	Xpath setting use, dont forget to use "",eg:--xpath "//*div[0]/text()"
index	str	default	use "," to spite the index, eg --index 1,2,3,4,5,6,7
print	bool	True	if you dont want to print out it in the console,set it as False
output	str	None	if you want to export your result, use it to set the path,eg: --output "D:/data.xlsx."
mode	str	None	you can use "img", "xp" and "re" to set mode get img urls,or use xpath, or regular expression
indexfile	str	None	you can directly read the link by file

Example 1: get the data with Regular Expression from single Page.

SimpleSpider --mode re --url https://www.163.com --re "<title>(.*.?)</title>"

output: 网易

Example 2: get the data with Xpath from single Page
SimpleSpider --mode xp--url https://www.163.com --xpath "//title/text()"

output:
网易

Example 3: get the data with Xpath from mulitiple Page
SimpleSpider --mode xp --url https://ent.163.com/20/0323/ --re "<title>(.*.?)</title>" --single False --index 08/F8D2BVI700038FO9.html,10/F8D8B35800038FO9.html

output:
'疫情期间还出游？网友在巴厘岛偶遇霍建华林心如_网易娱乐'
'台湾女星刘真去世：上《康熙》走红当郭台铭红娘_网易娱乐'

Example 4: get the data with Xpath from mulitiple Page/link SimpleSpider --mode xp --url https://ent.163.com/20/0323/ --re "<title>(.*.?)</title>" --single False --indexfile data.txt the indexfile should write like this: 1.html 2.html 3.html and the url are http://example.com/test (here is the index)

Example 5: get the data with Xpath from single Page
SimpleSpider --mode img --url https://www.baidu.com

output:
//www.baidu.com/img/gs.gif

If you want to use the function in this model,you just need to:

from SimpleSpider import SimpleSpider

there are some function for you to simply the code
Example 1:

result = SinglePageGetByRegEx(Url=http://www.163.com,Re="<title>(.*?.)")
the value of result is ['网易']

Example 2: List = [53,54,55,56]
result = MulityPageGetByRegEx(Url="http://www.oursteps.com.au/bbs/forum.php?mod=forumdisplay&fid=", IndexList=List,RegEx="<title>(.*?.)</title>") the value of result is [['生活其他 - 新足迹 - 新足迹澳洲华人生活大全'], ['证券外汇 - 新足迹澳洲华人生活大全'], ['个人理财 - 新足迹澳洲华人生活大全'], ['生意种种 - 新足迹澳洲华人生活大全']]

Xpath and Regular Expression are avaluable to be used.

also you can directly get the middle string in a page. Example 3: the html page is

<html>
<title>网易</title>
</html>

result = SinglePageGetMiddleStr(http://www.163.com,front="<title>,back="</title>")
output
['网易']

also you can directly get the image in a page. result = SinglePageGetImgUrl(http://www.baidu.com")
output
//www.baidu.com/img/gs.gif

if you want to know more, please visit : https://github.com/shanzhengliu/SimpleSpider

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.3

Mar 24, 2020

0.1.2

Mar 24, 2020

0.1.1

Mar 24, 2020

0.1.0

Mar 24, 2020

0.0.9

Mar 24, 2020

0.0.8

Mar 24, 2020

0.0.7

Mar 24, 2020

0.0.6

Mar 24, 2020

0.0.5

Mar 24, 2020

0.0.4

Mar 24, 2020

0.0.3

Mar 23, 2020

0.0.2

Mar 23, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SimpleSpider-0.1.3.tar.gz (6.1 kB view details)

Uploaded Mar 24, 2020 Source

File details

Details for the file SimpleSpider-0.1.3.tar.gz.

File metadata

Download URL: SimpleSpider-0.1.3.tar.gz
Upload date: Mar 24, 2020
Size: 6.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.4

File hashes

Hashes for SimpleSpider-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`3c35fe8930e492d2ba822329f600a738532102caa51a719a8f4b2e7f526e9c9a`
MD5	`8c7b394325408809b2bd562cb79ff85d`
BLAKE2b-256	`1ec3330546d29857553aa614ce4c2c967c562a4bb0c431c66a1bf2d23c9847a2`

See more details on using hashes here.

SimpleSpider 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SimpleSpider Instruction

how to install

How to install

Using in command

If you want to use the function in this model,you just need to:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes