A simple webscraper
Project description
PageCrawler
How to use
_request
- call the _request() function, it will first try a request with the request libary and then with selenium
- fill out these keywords: url: str, keyword: str, headers: dict = None, soup:bool=False, max_retry:int=2, wait:int=0
- Explanation:
- url : request url
- keyword: the keyword that should be in the website to know whether or not it got the right website, use '' to ignore
- headers: request header in dicit form, use {} for no headers, leave empty for basic request header
- soup : Whether or not returned as a soup object
- max_retry: how often it reties the request (boath the normal and selenium) to get a response containing the keyword
multi_request
- calls the _request in multiprocessing
- the first argument just uses a list of lists of these 3 arguments: [url, keyword, headers] (lenght of list determines how many request are done)
- new argument: process: int = 1, just determines how many processes are called at the same time
- the rest are just the same as _request, but apply to every request
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pagecrawler-1.1.2.tar.gz
(16.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pagecrawler-1.1.2.tar.gz.
File metadata
- Download URL: pagecrawler-1.1.2.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.12.7 Linux/6.13.2-zen1-1-zen
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f16a18ce0793489ddeb53d686628ae978f9d70f6a035c49cf0eca505688efd0
|
|
| MD5 |
65180eca66ab41b1f19797b334ec75c9
|
|
| BLAKE2b-256 |
3b2da1faa19dd1bd92ca6539f7e7ae618d5dfdcc0e3de6cc3ab141941818a460
|
File details
Details for the file pagecrawler-1.1.2-py3-none-any.whl.
File metadata
- Download URL: pagecrawler-1.1.2-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.12.7 Linux/6.13.2-zen1-1-zen
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2a74e5f2c44057c26e2330701f62bac7af7a33f36179125f784de70def79699
|
|
| MD5 |
974b44cdab76bf93366abeb9c472f496
|
|
| BLAKE2b-256 |
49ab2650d2763efe06a377bd7d94ac1a9a8b074b6cbb7a6b333cde7a4b67ea23
|