Skip to main content

A package using public proxies to randomise http requests

Project description

Vietnamese version

A convenient way to implement HTTP requests is using Pythons’ requests library. One of requests’ most popular features is simple proxying support. HTTP as a protocol has very well-defined semantics for dealing with proxies, and this contributed to the widespread deployment of HTTP proxies

Proxying is very useful when conducting intensive web crawling/scrapping or when you just want to hide your identity (anonymization).

In this project I am using public proxies to randomise http requests over a number of IP addresses and using a variety of known user agent headers these requests look to have been produced by different applications and operating systems.

Proxies

Proxies provide a way to use server P (the middleman) to contact server A and then route the response back to you. In more nefarious circles, it’s a prime way to make your presence unknown and pose as many clients to a website instead of just one client. Often times websites will block IPs that make too many requests, and proxies is a way to get around this. But even for simulating an attack, you should know how it’s done.

User Agent

Surprisingly, the only thing that tells a server the application triggered the request (like browser type or from a script) is a header called a “user agent” which is included in the HTTP request.

The source code

The project code in this repository is crawling five different public proxy websites: * http://proxyfor.eu/geo.php * http://free-proxy-list.net * http://rebro.weebly.com/proxy-list.html * http://www.samair.ru/proxy/time-01.htm * https://www.sslproxies.org

After collecting the proxy data and filtering the slowest ones it is randomly selecting one of them to query the target url. The request timeout is configured at 30 seconds and if the proxy fails to return a response it is deleted from the application proxy list. I have to mention that for each request a different agent header is used. The different headers are stored in the /data/user_agents.txt file which contains around 900 different agents.

Installation

If you wish to use this module as a CLI tool, install it globally via pip:

pip install http-request-randomizer

Otherwise, you can clone the repository and use setup tools:

python setup.py install

Dev testing

Clone repo, install requirements, develop and run tests:

pip install -r requirements.txt
tox -e pyDevVerbose

How to use

Command-line interface

Assuming that you have http-request-randomizer installed, you can use the commands below:

show help message:

proxyList   -h, --help

specify proxy provider(s) (required):

-s {proxyforeu,rebro,samair,freeproxy,all}

Specify output stream (default: sys.stdout), could also be a file:

-o, --outfile

specify provider timeout threshold in seconds:

-t, --timeout

specify proxy bandwidth threshold in KBs:

-bw, --bandwidth

show program’s version number:

-v, --version

API

To use http-request-randomizer as a library, include it in your requirements.txt file. Then you can simply generate a proxied request using a method call:

import time
from http_request_randomizer.requests.proxy.requestProxy import RequestProxy

if __name__ == '__main__':

    start = time.time()
    req_proxy = RequestProxy()
    print("Initialization took: {0} sec".format((time.time() - start)))
    print("Size: {0}".format(len(req_proxy.get_proxy_list())))
    print("ALL = {0} ".format(list(map(lambda x: x.get_address(), req_proxy.get_proxy_list()))))

    test_url = 'http://ipv4.icanhazip.com'

    while True:
        start = time.time()
        request = req_proxy.generate_proxied_request(test_url)
        print("Proxied Request Took: {0} sec => Status: {1}".format((time.time() - start), request.__str__()))
        if request is not None:
            print("\t Response: ip={0}".format(u''.join(request.text).encode('utf-8')))
        print("Proxy List Size: {0}".format(len(req_proxy.get_proxy_list())))

        print("-> Going to sleep..")
        time.sleep(10)

Documentation

http-request-randomizer documentation

Contributing

Many thanks to the open-source community for contributing to this project!

Faced an issue?

Open an issue here, and be as detailed as possible :)

Feels like a feature is missing?

Feel free to open a ticket! PRs are always welcome!

License

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

http_request_randomizer-1.3.1.tar.gz (37.3 kB view details)

Uploaded Source

File details

Details for the file http_request_randomizer-1.3.1.tar.gz.

File metadata

  • Download URL: http_request_randomizer-1.3.1.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.1 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.9.0

File hashes

Hashes for http_request_randomizer-1.3.1.tar.gz
Algorithm Hash digest
SHA256 6515643c9fda4076f5246b7344a497efa4fe4c2829da967c313471e2c88b79f4
MD5 0018360cda944b0dcc6fc0af02793903
BLAKE2b-256 26e98dbf4548dc45832f9159aef78e1fed1285495369d33d90f5502d864ded87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page