A web-proxy IP and user agent anonymizing framework for web scrapers, penetration testers, and ethical hackers

These details have not been verified by PyPI

Project links

Homepage

Project description

slither

A simple, easy to use framework for adding randomized, anonymous IP addresses and user-agents to web scrapers, crawlers, and penetration testing solutions

Slither is designed to add randomized User-Agent strings and Anonymous IP addresses from proxy sources around the web for use in your web scraping, penetration testing, or data aggregation projects. Please respect the owners and hardware of the data you are scraping. I BEAR NO RESPONIBILITY IF YOU USE THIS SOFTWARE WITHOUT PERMISSION OR TO DO HARM TO WEB ASSETS. RESPECT THE DATA AND PLATFORMS. BE ETHICAL WITH YOUR SCRAPING AND UNDERSTAND THAT ALL ASSETS HAVE FINITE RESOURCES. DONT DDOS THINGS. ITS BAD.

SLITHER IS ONYL COMPATIBLE WITH PYTHON 3. NO SUPPORT FOR PYTHON 2 IS PLANNED. That being said, to install on a machine with only Python 3 installed:

pip install slitherlib

On a machine with both Python 2 and 3 installed:

pip3 install slitherlib

From there, simply from slitherlib.slither import Snake to add the package to the project file that contains your scraping code.

Each instance of the Slither class has two variables, ip and ua for IP Address and User-Agent respectively. The IPs are pulled, in real time, from web proxy sources every time you declare and instance of the Slither class so no need to worry about IPs going stale. The majority of the addresses are less than 20 minutes old when pulled down and many are less than 10. An example of anonymizing your scraper with the Requests library looks like this:

import requests

from random import choice
from slitherlib.slither import Snake

s = requests.Session()
snake = Snake()
ip_addresses = snake.ips
user_agents = snake.uas
headers = {
    "User-Agent" : choice(user_agents)
    }
try:
    ip = ip_addresses.pop()
    r = s.get('https://www.google.com', proxies={'https' : ip, 'http' : ip} , headers=headers)
    print(r.status_code)
except requests.exceptions.ProxyError:
    print('Proxy Timed Out. Removing and Retrying')
    ip = ip_addresses.pop()
    r = s.get('https://www.google.com', proxies={'https' : ip, 'http' : ip} , headers=headers)
except IndexError:
    print("We've run out of IPs and/or User-Agents! Re-run your script to get more!")

This method also supports concurrency and adding an individual IP and/or UA to each thread or process that is spawned by your project! Accomplishing this is done as follows:

import requests

from slither import Slither
from concurrent.futures import ThreadPoolExecutor, wait, as_completed

# specify the number of threads your scraper will use as the "thread count".
# By default, thread_count is set to "all", meaning that you will pull down all available IP addresses and user agents. To specify a specific number, pass the number of desired items as an int to the named `thread_count` argument

num_of_threads = 7
futures = list_of_urls_to_scrape
new_slither = Slither(thread_count=num_of_threads)
#returns a list of dictionaries of IPs and User-Agents
for i in new_slither.masks:
    #spawn your threads here assigning i['address'] to your thread's proxy parameter and 
    #i['user-agent'] to each thread's 'User-Agent' header parameter
    ...

Have fun and happy Scraping!

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.4

Jul 12, 2019

0.1.3

Jul 12, 2019

0.1.2

Jul 12, 2019

0.1.1

Jul 12, 2019

0.1.0

Jul 12, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

slitherlib-0.1.4-py3-none-any.whl (7.0 kB view details)

Uploaded Jul 12, 2019 Python 3

File details

Details for the file slitherlib-0.1.4-py3-none-any.whl.

File metadata

Download URL: slitherlib-0.1.4-py3-none-any.whl
Upload date: Jul 12, 2019
Size: 7.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.0

File hashes

Hashes for slitherlib-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c881e1dd81b908ae4bd54c71dd88ebbd603f384847c0add87340ea7f5ba4d46c`
MD5	`1b0c47aa91889ce3382827920a655513`
BLAKE2b-256	`2d8ead1e6b8885286d903506a6f4c24a04b59ec7d3b7bd51857b5adabb9abe8d`

See more details on using hashes here.

slitherlib 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

slither

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes