A tool to download website contents through Tor with German exit nodes and extract images

These details have not been verified by PyPI

Project links

Project description

download-webpage-data

A Python tool to download website contents through Tor with German exit nodes and extract images from downloaded websites.

Prerequisites

Python 3.8 or higher
Tor service installed on your system
torrc configuration file (will be created automatically)

Installation

Clone this repository
Install dependencies:

pip install .

Features

Website Downloader

Routes all traffic through Tor
Uses German exit nodes exclusively
Downloads complete website contents
Preserves website structure
Handles errors gracefully
Supports sites with invalid SSL certificates
Retries failed downloads with new Tor identity

Image Extractor

Extracts all images from downloaded websites
Supports multiple image formats (jpg, jpeg, png, gif, webp, svg, ico)
Finds both direct image files and HTML-referenced images
Preserves original filenames
Creates organized output structure
Handles duplicate files

Usage

Downloading Websites

Ensure Tor service is running on your system:

# On Manjaro/Arch:
sudo systemctl start tor

Download a website:

# Interactive mode
python -m download_webpage_data

# Direct URL mode
python -m download_webpage_data -u https://example.com

# With SSL verification
python -m download_webpage_data --verify-ssl -u https://example.com

Extracting Images

After downloading one or more websites, run:

python -m download_webpage_data.extract_images

Select the website from the list
Images will be extracted to images/<website>/ directory

Command-line Options

Website Downloader

-u, --url: URL to download (if not provided, will prompt)
--verify-ssl: Enable SSL certificate verification (disabled by default)

Image Extractor

Interactive menu to select from downloaded websites
Press 'q' to quit at any time

Directory Structure

.
├── downloads/           # Downloaded websites
│   └── example.com/    # Website content
└── images/             # Extracted images
    └── example.com/    # Images from website

Security Note

This tool is for legitimate use only. Ensure you have permission to download website contents before using this tool.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.2

Feb 14, 2025

This version

1.0.1

Feb 14, 2025

1.0.0

Feb 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

download_webpage_data-1.0.1.tar.gz (2.2 kB view details)

Uploaded Feb 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

download_webpage_data-1.0.1-py3-none-any.whl (2.6 kB view details)

Uploaded Feb 14, 2025 Python 3

File details

Details for the file download_webpage_data-1.0.1.tar.gz.

File metadata

Download URL: download_webpage_data-1.0.1.tar.gz
Upload date: Feb 14, 2025
Size: 2.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for download_webpage_data-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`e5e1932feb4406cfc5582b55da0467aaf814ea0608058181d4d2664deb090ed2`
MD5	`04c7b4cac4058cf654d94d7601c877d8`
BLAKE2b-256	`236485976c4a6d41584daeb129daf48a9824bb1db376375c461159c6a8688ba9`

See more details on using hashes here.

File details

Details for the file download_webpage_data-1.0.1-py3-none-any.whl.

File metadata

Download URL: download_webpage_data-1.0.1-py3-none-any.whl
Upload date: Feb 14, 2025
Size: 2.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for download_webpage_data-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c64bf3c671849bf55959c9917e4375548d6cc5127980ff2f35b222e181f5674f`
MD5	`935f18807b3791a9a077faee68fe001a`
BLAKE2b-256	`a17129568a7f716d096156b8c5a08be46ca505e7ce6c9abeb4ba9449cfbfd09d`

See more details on using hashes here.

download-webpage-data 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

download-webpage-data

Prerequisites

Installation

Features

Website Downloader

Image Extractor

Usage

Downloading Websites

Extracting Images

Command-line Options

Website Downloader

Image Extractor

Directory Structure

Security Note

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes