Skip to main content

A simple webscraper

Project description

PageCrawler

How to use

_request

  • call the _request() function, it will first try a request with the request libary and then with selenium
  • fill out these keywords: url: str, keyword: str, headers: dict = None, soup:bool=False, max_retry:int=2, wait:int=0
  • Explanation:
    • url : request url
    • keyword: the keyword that should be in the website to know whether or not it got the right website, use '' to ignore
    • headers: request header in dicit form, use {} for no headers, leave empty for basic request header
    • soup : Whether or not returned as a soup object
    • max_retry: how often it reties the request (boath the normal and selenium) to get a response containing the keyword

multi_request

  • calls the _request in multiprocessing
  • the first argument just uses a list of lists of these 3 arguments: [url, keyword, headers] (lenght of list determines how many request are done)
  • new argument: process: int = 1, just determines how many processes are called at the same time
  • the rest are just the same as _request, but apply to every request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pagecrawler-1.1.2.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pagecrawler-1.1.2-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file pagecrawler-1.1.2.tar.gz.

File metadata

  • Download URL: pagecrawler-1.1.2.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.7 Linux/6.13.2-zen1-1-zen

File hashes

Hashes for pagecrawler-1.1.2.tar.gz
Algorithm Hash digest
SHA256 6f16a18ce0793489ddeb53d686628ae978f9d70f6a035c49cf0eca505688efd0
MD5 65180eca66ab41b1f19797b334ec75c9
BLAKE2b-256 3b2da1faa19dd1bd92ca6539f7e7ae618d5dfdcc0e3de6cc3ab141941818a460

See more details on using hashes here.

File details

Details for the file pagecrawler-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: pagecrawler-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.7 Linux/6.13.2-zen1-1-zen

File hashes

Hashes for pagecrawler-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c2a74e5f2c44057c26e2330701f62bac7af7a33f36179125f784de70def79699
MD5 974b44cdab76bf93366abeb9c472f496
BLAKE2b-256 49ab2650d2763efe06a377bd7d94ac1a9a8b074b6cbb7a6b333cde7a4b67ea23

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page