A web scraper that downloads tables, images, and text from a webpage
Project description
generic-crawler
This project contains a Python script for a web scraper that extracts tables, images, and text from a given website.
Requirements
- Python 3.7 or later
- Selenium
- Beautiful Soup
- Pandas
- tqdm
- Requests
Installation
pip
pip install WebScraper
github
- Clone this repository:
git clone https://gitlab.kaisens.fr/kaisensdata/apps/4inshield/drivers/generic-crawler/-/tree/asaid
- Install the required Python packages:
pip install -r requirements.txt
Usage
-
Download the appropriate chromedriver for your system and add it to your system's PATH or specify the path when initializing the
WebScraper
class. -
Use the following example code to run the scraper:
from scraper import WebScraper
chrome_driver_path = "<path_to_your_chromedriver>"
url = "https://example.com"
scraper = WebScraper(chrome_driver_path)
scraper.process_website(url)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
SiteCrawler-0.1.tar.gz
(4.1 kB
view hashes)
Built Distribution
Close
Hashes for SiteCrawler-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c673abffdcaf669d8eec07b42c0a2db61e02d43253bcc1f73ce5251a378f16d |
|
MD5 | e671cec4e844fe328a648ce768c256cc |
|
BLAKE2b-256 | f31a5983b4ad59d97d9eadb222184ef1b1ae664053c76920d8523eef30f7155d |