http-request-randomizer

A package using public proxies to randomise http requests.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Web Environment
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Internet :: WWW/HTTP
- Software Development :: Libraries :: Python Modules

Project description

A convenient way to implement HTTP requests is using Pythons’ requests library. One of requests’ most popular features is simple proxying support. HTTP as a protocol has very well-defined semantics for dealing with proxies, and this contributed to the widespread deployment of HTTP proxies

Proxying is very useful when conducting intensive web crawling/scrapping or when you just want to hide your identity (anomization).

In this project I am using public proxies to randomise http requests over a number of IP addresses and using a variety of known user agent headers these requests look to have been produced by different applications and operating systems.

Proxies

Proxies provide a way to use server P (the middleman) to contact server A and then route the response back to you. In more nefarious circles, it’s a prime way to make your presence unknown and pose as many clients to a website instead of just one client. Often times websites will block IPs that make too many requests, and proxies is a way to get around this. But even for simulating an attack, you should know how it’s done.

User Agent

Surprisingly, the only thing that tells a server the application triggered the request (like browser type or from a script) is a header called a “user agent” which is included in the HTTP request.

The source code

The project code in this repository is crawling four different public proxy websites: * http://proxyfor.eu/geo.php * http://free-proxy-list.net * http://rebro.weebly.com/proxy-list.html * http://www.samair.ru/proxy/time-01.htm

After collecting the proxy data and filtering the slowest ones it is randomly selecting one of them to query the target url. The request timeout is configured at 30 seconds and if the proxy fails to return a response it is deleted from the application proxy list. I have to mention that for each request a different agent header is used. The different headers are stored in the /data/user_agents.txt file which contains around 900 different agents.

How to use

The project is now distribured as a PyPI package! To run an example simply include http-request-randomizer in your requirements.txt file. Then run the code below:

import time
from http_request_randomizer.requests.proxy.requestProxy import RequestProxy

if __name__ == '__main__':

    start = time.time()
    req_proxy = RequestProxy()
    print("Initialization took: {0} sec".format((time.time() - start)))
    print("Size: {0}".format(len(req_proxy.get_proxy_list())))
    print("ALL = {0} ".format(req_proxy.get_proxy_list()))

    test_url = 'http://ipv4.icanhazip.com'

    while True:
        start = time.time()
        request = req_proxy.generate_proxied_request(test_url)
        print("Proxied Request Took: {0} sec => Status: {1}".format((time.time() - start), request.__str__()))
        if request is not None:
            print("\t Response: ip={0}".format(u''.join(request.text).encode('utf-8')))
        print("Proxy List Size: {0}".format(len(req_proxy.get_proxy_list())))

        print("-> Going to sleep..")
        time.sleep(10)

Documentation

http-request-randomizer documentation

Contributing

Contributions are always welcome! Feel free to send a pull request.

Faced an issue?

Open an issue here, and be as detailed as possible :)

License

This project is licensed under the terms of the MIT license.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Environment
- Web Environment
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Internet :: WWW/HTTP
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

1.3.2

Nov 15, 2020

1.3.1

Oct 16, 2020

1.2.3

Jul 5, 2018

1.2.2

Feb 21, 2018

1.2.1

Jan 9, 2018

This version

1.1.1

Aug 8, 2017

1.1.0

Jul 30, 2017

1.0.7

Jul 6, 2017

1.0.5

Jun 11, 2017

1.0.4

May 5, 2017

1.0.3

Apr 12, 2017

0.0.5

Oct 31, 2016

0.0.3

Sep 14, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

http_request_randomizer-1.1.1.tar.gz (19.4 kB view details)

Uploaded Aug 8, 2017 Source

File details

Details for the file http_request_randomizer-1.1.1.tar.gz.

File metadata

Download URL: http_request_randomizer-1.1.1.tar.gz
Upload date: Aug 8, 2017
Size: 19.4 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for http_request_randomizer-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e1c232f48df3f623894374c776ca6ace5e7a3821d1db59c3394e2200dbcca464`
MD5	`d1d7e4f7b654acb751ba63fdbe40a297`
BLAKE2b-256	`8df999d3f9661569a3c5a6b61068381248f446bc83f64252afab65bf4cc01007`

See more details on using hashes here.

http-request-randomizer 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Proxies

User Agent

The source code

How to use

Documentation

Contributing

Faced an issue?

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes