Skip to main content

A simple webscraper library

Project description

XSCRAPERS

The XSCRAPERS package provides an OOP interface to some simple webscraping techniques.

A base use case can be to load some pages to Beautifulsoup Elements. This package allows to load the URLs concurrently using multiple threads, which allows to safe an enormous amount of time.

import xscrapers.webscraper as ws

URLS = [
    "https://www.google.com/",
    "https://www.amazon.com/",
    "https://www.youtube.com/",
]
PARSER = "html.parser"
web_scraper = ws.Webscraper(PARSER, verbose=True)
web_scraper.load(URLS)
web_scraper.parse()

Note that herein, the data scraped is stored in the data attribute of the webscraper. The URLs parsed are stored in the url attribute.

Downloading the Firefox Geckodriver

Linux

See this link for a good explanation. In short, the steps are:

  1. Download the geckodriver from the mozilla GitHub release page, note to change the X for the version you want to download

    wget https://github.com/mozilla/geckodriver/releases/download/vX.XX.X/geckodriver-vX.XX.X-linux64.tar.gz
    
  2. Extract the file with

    tar -xvzf geckodriver*
    
  3. Make it executable

    chmod +x geckodriver
    
  4. In the last step, the driver can be added to the PATH environment variable, moved to the usr/local/bin folder, or can be given as full path to the Webdriver class as exe_path argument

    export PATH=$PATH:/path-to-extracted-file/
    sudo mv geckodriver /usr/local/bin/
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xscrapers-0.0.7.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xscrapers-0.0.7-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file xscrapers-0.0.7.tar.gz.

File metadata

  • Download URL: xscrapers-0.0.7.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.8 Linux/5.4.72-microsoft-standard-WSL2

File hashes

Hashes for xscrapers-0.0.7.tar.gz
Algorithm Hash digest
SHA256 387f4b8cdc5edf13feb9f1cf71794a6367dec9546eb8854363eee845e9d7b568
MD5 91bc7a9662dcd2ce3c057768a9020055
BLAKE2b-256 f479b8ce2cd64a3fea7086e6c1351d7c0808710807a49a807de2d4b63f53d54c

See more details on using hashes here.

File details

Details for the file xscrapers-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: xscrapers-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.8 Linux/5.4.72-microsoft-standard-WSL2

File hashes

Hashes for xscrapers-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 fbe08f1426007a58b8678981fc890f36ed21c43443ff61c9765822f690d07d43
MD5 3d983d45ad37e2467b70d3f3e0c2cdb2
BLAKE2b-256 bb591028f19bc039e212394d60448d9db4f3421a9dc2a4bd28c218e086d19f12

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page