A tool to download website contents through Tor with German exit nodes and extract images
Project description
download-webpage-data
A Python tool to download website contents through Tor with German exit nodes and extract images from downloaded websites.
Prerequisites
- Python 3.8 or higher
- Tor service installed on your system
torrcconfiguration file (will be created automatically)
Installation
- Clone this repository
- Install dependencies:
pip install .
Features
Website Downloader
- Routes all traffic through Tor
- Uses German exit nodes exclusively
- Downloads complete website contents
- Preserves website structure
- Handles errors gracefully
- Supports sites with invalid SSL certificates
- Retries failed downloads with new Tor identity
Image Extractor
- Extracts all images from downloaded websites
- Supports multiple image formats (jpg, jpeg, png, gif, webp, svg, ico)
- Finds both direct image files and HTML-referenced images
- Preserves original filenames
- Creates organized output structure
- Handles duplicate files
Usage
Downloading Websites
- Ensure Tor service is running on your system:
# On Manjaro/Arch:
sudo systemctl start tor
- Download a website:
# Interactive mode
python -m download_webpage_data
# Direct URL mode
python -m download_webpage_data -u https://example.com
# With SSL verification
python -m download_webpage_data --verify-ssl -u https://example.com
Extracting Images
- After downloading one or more websites, run:
python -m download_webpage_data.extract_images
- Select the website from the list
- Images will be extracted to
images/<website>/directory
Command-line Options
Website Downloader
-u, --url: URL to download (if not provided, will prompt)--verify-ssl: Enable SSL certificate verification (disabled by default)
Image Extractor
- Interactive menu to select from downloaded websites
- Press 'q' to quit at any time
Directory Structure
.
├── downloads/ # Downloaded websites
│ └── example.com/ # Website content
└── images/ # Extracted images
└── example.com/ # Images from website
Security Note
This tool is for legitimate use only. Ensure you have permission to download website contents before using this tool.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file download_webpage_data-1.0.2.tar.gz.
File metadata
- Download URL: download_webpage_data-1.0.2.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f3957d2d9d6159680599c8cde5f40a44aa4b65f86f11a842cc26f5174548276
|
|
| MD5 |
96f1dd5c6ff2de867309fe9e1971aafd
|
|
| BLAKE2b-256 |
3244c1e748a40d0c13c9a0bb6bb94b958d9fed3733ba3ea46c8fd8ec3e3d0c82
|
File details
Details for the file download_webpage_data-1.0.2-py3-none-any.whl.
File metadata
- Download URL: download_webpage_data-1.0.2-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2eee7096917f00ff27044e1b57c5dc79b236f36bc3fa0808d6f4fbea2eb486e
|
|
| MD5 |
09c005f2cb4574f18c5e81081a73005d
|
|
| BLAKE2b-256 |
19bd02ad9f65716f4deeca8c80108bdb9a1c9a7202c9ee2a34c410077345e688
|