An extremely customizable web scraper with a modular notification system and persistent storage via SQLite.
Project description
Scrape and Ntfy
An extremely customizable web scraper with a modular notification system and persistent storage via SQLite.
Features
- Modular notification system
- Currently supports Webhooks (e.g. Discord, Slack, etc.) and ntfy.sh
- Web scraping via Selenium
- Simple configuration of multiple scrapers with conditional notifications
Usage
Prerequisites
- A browser
- Preferably a Firefox-based browser or some chromium-based browsers
- Edge is not recommended
Basic Configuration
- Configuration for the web scraper is handled through a TOML file
- To see an example configuration, see
config.example.toml
- This can be copied to
config.toml
and edited to suit your needs - To get the CSS selector for an element, you can use your browser's developer tools (
F12
,Ctrl+Shift+I
, right-click -> Inspect Element, etc.)- If you're not already in inspect, you can press
Ctrl+Shift+C
to enter inspect element mode (or just click the inspect button in the developer tools) - Click on the element you want to select
- Right-click on the element in the HTML pane
- Click "Copy" -> "Copy selector"
- If you're not already in inspect, you can press
- To see an example configuration, see
- Some other configuration is handled through environment variables and/or command-line arguments (
--help
for more information)- For example, to set the path to the configuration file, you can set the
PATH_TO_TOML
environment variable or use the--path-to-toml
command-line argument
- For example, to set the path to the configuration file, you can set the
Docker (Recommended)
Specific perquisites
- Docker
- Docker is a platform for developing, shipping, and running applications in containers
- Docker Compose
Installation and usage
- Clone the repository
git clone https://github.com/slashtechno/scrape-and-ntfy
- Change directory into the repository
cd scrape-and-ntfy
- Configure via
config.toml
- Optionally, you can configure some other options via environment variables or command-line arguments
docker compose up -d
- The
-d
flag runs the containers in the background - If you want, you can run
sqlite-web
by uncommenting the appropriate lines indocker-compose.yml
to view the database in a browser on localhost:5050
- The
PDM
Specific perquisites
- Python (3.11+)
- PDM
Installation and usage
- Clone the repository
git clone https://github.com/slashtechno/scrape-and-ntfy
- Change directory into the repository
cd scrape-and-ntfy
- Configure via
config.toml
- Optionally, you can configure some other options via environment variables or command-line arguments
pdm run python -m scrape_and_ntfy
- This will run the bot with the configuration in
config.toml
- This will run the bot with the configuration in
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrape_and_ntfy-0.1.1.tar.gz
(9.0 kB
view hashes)
Built Distribution
Close
Hashes for scrape_and_ntfy-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e6eb39342b0567b10aecc5277a56278b79a7c0c48288da6547787af0657e12e |
|
MD5 | 948f79807ad25fcf540579622e69b0e9 |
|
BLAKE2b-256 | 12b146e51747e47b9c392a7268d501382726aaa5192dd5f4daee44d36082cd83 |