A web scraper that downloads tables, images, and text from a webpage
Project description
WebDataMiner
This project contains a Python script for a web scraper that extracts tables, images, and text from a given website.
Requirements
- Python 3.7 or later
- Selenium
- Beautiful Soup
- Pandas
- tqdm
- Requests
Installation
pip
pip install WebDataMiner
github
- Clone this repository:
git clone https://gitlab.kaisens.fr/kaisensdata/apps/4inshield/drivers/generic-crawler/-/tree/asaid
- Install the required Python packages:
pip install -r requirements.txt
Usage
-
Download the appropriate chromedriver for your system and add it to your system's PATH or specify the path when initializing the
WebScraper
class. -
Use the following example code to run the scraper:
from WebDataMiner import WebScraper
chrome_driver_path = "<path_to_your_chromedriver>"
url = "https://example.com"
scraper = WebScraper(chrome_driver_path)
scraper.process_website(url)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
WebDataMiner-0.2.tar.gz
(4.1 kB
view hashes)
Built Distribution
Close
Hashes for WebDataMiner-0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed21c2c7ec3e3c1eeb76e213f4add8e7506ccbf2c5d35fa45209a42de5574a96 |
|
MD5 | 1a954e89ebce2d13e1cbc11e2cb5e7f6 |
|
BLAKE2b-256 | d709f263bcf77d0b46a3cbd07fab8731f0cc784fe8ef1a01f1974fa639c267de |