Skip to main content

A simple webscraper

Project description

PageCrawler

How to use

_request

  • call the _request() function, it will first try a request with the request libary and then with selenium
  • fill out these keywords: url: str, keyword: str, headers: dict = None, soup:bool=False, max_retry:int=2, wait:int=0
  • Explanation:
    • url : request url
    • keyword: the keyword that should be in the website to know whether or not it got the right website, use '' to ignore
    • headers: request header in dicit form, use {} for no headers, leave empty for basic request header
    • soup : Whether or not returned as a soup object
    • max_retry: how often it reties the request (boath the normal and selenium) to get a response containing the keyword

multi_request

  • calls the _request in multiprocessing
  • the first argument just uses a list of lists of these 3 arguments: [url, keyword, headers] (lenght of list determines how many request are done)
  • new argument: process: int = 1, just determines how many processes are called at the same time
  • the rest are just the same as _request, but apply to every request

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pagecrawler-1.1.1.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pagecrawler-1.1.1-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file pagecrawler-1.1.1.tar.gz.

File metadata

  • Download URL: pagecrawler-1.1.1.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.7 Linux/6.13.2-zen1-1-zen

File hashes

Hashes for pagecrawler-1.1.1.tar.gz
Algorithm Hash digest
SHA256 3dd2a93486dc87d0396cb10c21598da14237357f9890fa60239a92f37e7147de
MD5 6caa0f4049462b4e5633bf5c27c5c1e0
BLAKE2b-256 61948188a26d4b519698ef6dc423424ca83f3671f56e7baa55b6cd4d6822c75c

See more details on using hashes here.

File details

Details for the file pagecrawler-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: pagecrawler-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.7 Linux/6.13.2-zen1-1-zen

File hashes

Hashes for pagecrawler-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d5d664a8ac7796b0ed1656592daac57cb7ae6e3cfa8302d5b5fad62142227d3e
MD5 d400b6ee6bd31d20be0a9a92c1b1d034
BLAKE2b-256 022da602a81c083a17f4bfd5876167e33e414c7d404543045c069ce98e46c1a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page