Skip to main content

A web scraper that downloads tables, images, and text from a webpage

Project description

WebDataMiner

This project contains a Python script for a web scraper that extracts tables, images, and text from a given website.

Requirements

  • Python 3.7 or later
  • Selenium
  • Beautiful Soup
  • Pandas
  • tqdm
  • Requests

Installation

pip

pip install WebDataMiner

github

  1. Clone this repository:

git clone https://gitlab.kaisens.fr/kaisensdata/apps/4inshield/drivers/generic-crawler/-/tree/asaid

  1. Install the required Python packages:

pip install -r requirements.txt

Usage

  1. Download the appropriate chromedriver for your system and add it to your system's PATH or specify the path when initializing the WebScraper class.

  2. Use the following example code to run the scraper:

from WebDataMiner import WebScraper

chrome_driver_path = "<path_to_your_chromedriver>"
url = "https://example.com"
scraper = WebScraper(chrome_driver_path)
scraper.process_website(url)

Project details


Release history Release notifications | RSS feed

This version

0.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

WebDataMiner-0.2.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

WebDataMiner-0.2-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file WebDataMiner-0.2.tar.gz.

File metadata

  • Download URL: WebDataMiner-0.2.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for WebDataMiner-0.2.tar.gz
Algorithm Hash digest
SHA256 70c2b8936e775edfc48b11fce88abe62d4090ab0c3273fb001a06a6bea8a552d
MD5 fd78f46ffe56d7811ee9a59b5408b827
BLAKE2b-256 7b414d771b9972f9dceeb3072c9f369b9fdd112174e8bfaa6b45a55a41374a04

See more details on using hashes here.

File details

Details for the file WebDataMiner-0.2-py3-none-any.whl.

File metadata

  • Download URL: WebDataMiner-0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for WebDataMiner-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ed21c2c7ec3e3c1eeb76e213f4add8e7506ccbf2c5d35fa45209a42de5574a96
MD5 1a954e89ebce2d13e1cbc11e2cb5e7f6
BLAKE2b-256 d709f263bcf77d0b46a3cbd07fab8731f0cc784fe8ef1a01f1974fa639c267de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page