Tor onion site scraping tool

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

torspy

torspy is a Python package for scraping .onion sites using Tor. It provides a simple interface for fetching HTML content from .onion URLs, searching for specific text within the content, and saving the results to a file.

Installation

You can install torspy via pip:

pip install torspy

Usage

Command-Line Interface

torspy allows you to interact with .onion sites from the command line:

To display the content of a .onion site:

torspy http://example.onion

To save the displayed content to a file:

torspy http://example.onion -s file.html

The -s flag indicates saving, and you can specify any file name.
To search for specific text within the content and save the results to a file:

torspy http://example.onion --find "search query" -s search_results.html

The --find flag followed by the search query indicates searching for specific text.
The -s flag followed by the file name indicates saving the search results.
To save the content to a specified directory:

torspy http://example.onion -s file.html -d /path/to/directory

The -d flag followed by the directory path indicates where to save the file.
For more information on available options, you can use the --help flag:

torspy --help

Additional Examples

Display the content of a .onion site and search for "important information", saving the results to a file named results.html in the specified directory:

torspy http://example.onion --find "important information" -s results.html -d /path/to/directory

Save the entire HTML content of a .onion site to a file named full_content.html in the current directory:

torspy http://example.onion -s full_content.html

Display the content of a .onion site and save it to a file named output.txt in the current directory:

torspy http://example.onion -s output.txt

Using torspy in a Bash Script

You can incorporate torspy into your Bash scripts for automated tasks. Here's an example script that fetches content from a list of .onion URLs and saves it to individual files:

#!/bin/bash

# List of .onion URLs
urls=("http://example1.onion" "http://example2.onion" "http://example3.onion")

# Loop through each URL
for url in "${urls[@]}"; do
    # Fetch content and save to a file
    torspy "$url" -s "${url##*/}.html"
done

Contributing

If you encounter any issues or have suggestions for improvements, feel free to open an issue or submit a pull request on GitHub.

How torspy Works

torspy utilizes the following process to interact with .onion sites:

Checking Site Existence: It verifies if the .onion site exists and is reachable through the Tor network.
Fetching HTML Content: It retrieves the HTML content of the .onion site using Tor for anonymity.
Scraping and Searching: If specified, torspy searches for specific text within the content and extracts matching results.
Saving Results: Optionally, torspy allows you to save the retrieved content, either the entire HTML or the search results, to a file.

Code Overview

torspy consists of the following components:

check_onion_site(url): Checks if the .onion site exists and is reachable.
scrape_onion_site(url, search_query, save_file, save_directory): Scrapes the .onion site, searches for specific text, and saves results if required.
main(): Handles command-line arguments and invokes the scraping functionality.

Contributing to torspy

If you're interested in contributing to torspy, you can:

Report issues encountered while using torspy.
Suggest new features or enhancements.
Submit pull requests with improvements or fixes.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

3.0.0

Jun 13, 2024

2.0.8

Jun 13, 2024

2.0.7

Jun 10, 2024

2.0.6

Jun 6, 2024

2.0.5

Jun 6, 2024

2.0.4

Jun 6, 2024

2.0.3

Jun 5, 2024

2.0.2

May 19, 2024

2.0.1

May 19, 2024

2.0.0

May 19, 2024

1.0.3

May 17, 2024

1.0.2

May 17, 2024

1.0.1

May 17, 2024

1.0.0

May 17, 2024

0.3

May 16, 2024

This version

0.2

May 16, 2024

0.1

May 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torspy-0.2.tar.gz (4.9 kB view hashes)

Uploaded May 16, 2024 Source

Built Distribution

torspy-0.2-py3-none-any.whl (5.5 kB view hashes)

Uploaded May 16, 2024 Python 3

Hashes for torspy-0.2.tar.gz

Hashes for torspy-0.2.tar.gz
Algorithm	Hash digest
SHA256	`6cbddd75552b261f16ab7f55f35c385f550b2e21b3c110a8c235e2b4deebdf62`
MD5	`74d7885e601d0173bf19711b6c0040d2`
BLAKE2b-256	`761abf705f8331dbdccc4b72862982a0749828979ca3cd9812b75674a495f512`

Hashes for torspy-0.2-py3-none-any.whl

Hashes for torspy-0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5db37a4ac4b923e7647edf251d605d75a44e23bc0de8f380ec9bef3e0117074d`
MD5	`e2bccfafb2bb9671adfe1538bc7bd25e`
BLAKE2b-256	`506bf823f3ba1033fe714a27f034d7a75d28e9f92c0e0264bf731a01cab741f6`