Skip to main content

The New (auto rotate) Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS.

Project description

ProxyBroker

All Contributors

===========

Quality Gate Status test result GitHub issues GitHub stars GitHub forks GitHub license Twitter

ProxyBroker is an open source tool that asynchronously finds public proxies from multiple sources and concurrently checks them.

image

Features

  • Finds more than 7000 working proxies from ~50 sources.
  • Support protocols: HTTP(S), SOCKS4/5. Also CONNECT method to ports 80 and 23 (SMTP).
  • Proxies may be filtered by type, anonymity level, response time, country and status in DNSBL.
  • Work as a proxy server that distributes incoming requests to external proxies. With automatic proxy rotation.
  • All proxies are checked to support Cookies and Referer (and POST requests if required).
  • Automatically removes duplicate proxies.
  • Is asynchronous.

Docker

Docker Hub https://hub.docker.com/r/bluet/proxybroker2

$ docker run --rm bluet/proxybroker2 --help
  usage: proxybroker [--max-conn MAX_CONN] [--max-tries MAX_TRIES]
                     [--timeout SECONDS] [--judge JUDGES] [--provider PROVIDERS]
                     [--verify-ssl]
                     [--log [{NOTSET,DEBUG,INFO,WARNING,ERROR,CRITICAL}]]
                     [--min-queue MINIMUM_PROXIES_IN_QUEUE]
                     [--version] [--help]
                     {find,grab,serve,update-geo} ...

  Proxy [Finder | Checker | Server]

  Commands:
    These are common commands used in various situations

    {find,grab,serve,update-geo}
      find                Find and check proxies
      grab                Find proxies without a check
      serve               Run a local proxy server
      update-geo          Download and use a detailed GeoIP database

  Options:
    --max-conn MAX_CONN   The maximum number of concurrent checks of proxies
    --max-tries MAX_TRIES
                          The maximum number of attempts to check a proxy
    --timeout SECONDS, -t SECONDS
                          Timeout of a request in seconds. The default value is
                          8 seconds
    --judge JUDGES        Urls of pages that show HTTP headers and IP address
    --provider PROVIDERS  Urls of pages where to find proxies
    --verify-ssl, -ssl    Flag indicating whether to check the SSL certificates
    --min-queue MINIMUM_PROXIES_IN_QUEUE   The minimum number of proxies in the queue for checking connectivity
    --log [{NOTSET,DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                          Logging level
    --version, -v         Show program's version number and exit
    --help, -h            Show this help message and exit

  Run 'proxybroker <command> --help' for more information on a command.
  Suggestions and bug reports are greatly appreciated:
  <https://github.com/bluet/proxybroker2/issues>

Requirements

Installation

Install locally

To install last stable release from pypi:

NOT RECOMMEND. It will install the out-dated original proxybroker package, which is no longer maintained by original maintainer. https://github.com/constverum/ProxyBroker We will upload the up-to-date package under new name (proxybroker2) when the support for 3.10 is ready. https://github.com/bluet/proxybroker2/issues/89

$ pip install proxybroker

To install the latest development version from GitHub:

$ pip install -U git+https://github.com/bluet/proxybroker2.git

Use pre-built Docker image

$ docker pull bluet/proxybroker2

Build bundled one-file executable with pyinstaller

Requirements

On UNIX-like systems (Linux / macOSX / BSD)

Install these tools

  • upx
  • objdump (this tool is usually in the binutils package)
  • $ sudo apt install upx-ucl binutils (On Ubuntu / Debian)

Build

pip install pyinstaller \
&& pip install . \
&& mkdir -p build \
&& cd build \
&& pyinstaller --onefile --name proxybroker --add-data "../proxybroker/data:data" --workpath ./tmp --distpath . --clean ../py2exe_entrypoint.py \
&& rm -rf tmp

The executable is now in the build directory

Usage

CLI Examples

Find

Find and show 10 HTTP(S) proxies from United States with the high level of anonymity:

$ proxybroker find --types HTTP HTTPS --lvl High --countries US --strict -l 10

image

Grab

Find and save to a file 10 US proxies (without a check):

$ proxybroker grab --countries US --limit 10 --outfile ./proxies.txt

image

Serve

Run a local proxy server that distributes incoming requests to a pool of found HTTP(S) proxies with the high level of anonymity:

$ proxybroker serve --host 127.0.0.1 --port 8888 --types HTTP HTTPS --lvl High --min-queue 5

image

Run proxybroker --help for more information on the options available. Run proxybroker <command> --help for more information on a command.

Basic code example

Find and show 10 working HTTP(S) proxies:

import asyncio
from proxybroker import Broker

async def show(proxies):
    while True:
        proxy = await proxies.get()
        if proxy is None: break
        print('Found proxy: %s' % proxy)

proxies = asyncio.Queue()
broker = Broker(proxies)
tasks = asyncio.gather(
    broker.find(types=['HTTP', 'HTTPS'], limit=10),
    show(proxies))

loop = asyncio.get_event_loop()
loop.run_until_complete(tasks)

More examples.

Proxy information per requests

HTTP

Check X-Proxy-Info header in response.

$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://httpbin.org/get
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
> GET http://httpbin.org/get HTTP/1.1
> Host: httpbin.org
> User-Agent: curl/7.58.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< X-Proxy-Info: 174.138.42.112:8080
< Date: Mon, 04 May 2020 03:39:40 GMT
< Content-Type: application/json
< Content-Length: 304
< Server: gunicorn/19.9.0
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
< X-Cache: MISS from ADM-MANAGER
< X-Cache-Lookup: MISS from ADM-MANAGER:880
< Connection: keep-alive
<
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Cache-Control": "max-age=259200",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.58.0",
    "X-Amzn-Trace-Id": "Root=1-5eaf8e7c-6a1162a1387a1743a49063f4"
  },
  "origin": "...",
  "url": "http://httpbin.org/get"
}
* Connection #0 to host 127.0.0.1 left intact

HTTPS

We are not able to modify HTTPS traffic to inject custom header once they start being encrypted. A X-Proxy-Info will be sent to client after HTTP/1.1 200 Connection established but not sure how clients can read it.

(env) bluet@ocisly:~/workspace/proxybroker2$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v https://httpbin.org/get
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to httpbin.org:443
> CONNECT httpbin.org:443 HTTP/1.1
> Host: httpbin.org:443
> User-Agent: curl/7.58.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 Connection established
< X-Proxy-Info: 207.148.22.139:8080
<
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
...
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5560b2e93580)
> GET /get HTTP/2
> Host: httpbin.org
> User-Agent: curl/7.58.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 200
< date: Mon, 04 May 2020 03:39:35 GMT
< content-type: application/json
< content-length: 256
< server: gunicorn/19.9.0
< access-control-allow-origin: *
< access-control-allow-credentials: true
<
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.58.0",
    "X-Amzn-Trace-Id": "Root=1-5eaf8e77-efcb353b0983ad6a90f8bdcd"
  },
  "origin": "...",
  "url": "https://httpbin.org/get"
}
* Connection #0 to host 127.0.0.1 left intact

HTTP API

Get info of proxy been used for retrieving specific url

For HTTP, it's easy.

$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/history/url:http://httpbin.org/get
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
> GET http://proxycontrol/api/history/url:http://httpbin.org/get HTTP/1.1
> Host: proxycontrol
> User-Agent: curl/7.58.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 34
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
<
{"proxy": "..."}

For HTTPS, we're not able to know encrypted payload (request), so only hostname can be used.

$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/history/url:httpbin.org:443
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
> GET http://proxycontrol/api/history/url:httpbin.org:443 HTTP/1.1
> Host: proxycontrol
> User-Agent: curl/7.58.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 34
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
<
{"proxy": "..."}
* Connection #0 to host 127.0.0.1 left intact

Remove specific proxy from queue

$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/remove/PROXY_IP:PROXY_PORT
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
> GET http://proxycontrol/api/remove/... HTTP/1.1
> Host: proxycontrol
> User-Agent: curl/7.58.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 204 No Content
<
* Connection #0 to host 127.0.0.1 left intact

Documentation

https://proxybroker.readthedocs.io/

TODO

  • Check the ping, response time and speed of data transfer
  • Check site access (Google, Twitter, etc) and even your own custom URL's
  • Information about uptime
  • Checksum of data returned
  • Support for proxy authentication
  • Finding outgoing IP for cascading proxy
  • The ability to specify the address of the proxy without port (try to connect on defaulted ports)

Contributing

License

Licensed under the Apache License, Version 2.0

This product includes GeoLite2 data created by MaxMind, available from http://www.maxmind.com.

Refs

Contributors ✨

Thanks goes to these wonderful people (emoji key):


a5r0n

💻

C.M. Yang

💻 🤔 👀

Ivan Villareal

💻

Quancore

💻

Felipe

🤔

vincentinttsh

💻 👀

Ziloka

💻

hms5232

💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proxybroker2-2.0.0a4.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

proxybroker2-2.0.0a4-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file proxybroker2-2.0.0a4.tar.gz.

File metadata

  • Download URL: proxybroker2-2.0.0a4.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.9 Darwin/22.2.0

File hashes

Hashes for proxybroker2-2.0.0a4.tar.gz
Algorithm Hash digest
SHA256 10d325e940959957bd795b628c5c5c44208d780f55ad7d93e7200c529655f17e
MD5 720082290dd6b044d49ad639db5b7beb
BLAKE2b-256 ffcf19f70bccfa9f0f509c59743e19452dd42c29a8cead74332d9780157b1e00

See more details on using hashes here.

File details

Details for the file proxybroker2-2.0.0a4-py3-none-any.whl.

File metadata

  • Download URL: proxybroker2-2.0.0a4-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.10.9 Darwin/22.2.0

File hashes

Hashes for proxybroker2-2.0.0a4-py3-none-any.whl
Algorithm Hash digest
SHA256 f33d61ddc86d1c1826467f9c4679c5f3b5f93b03efe7bc147125d418f063a9aa
MD5 2dade2f414758ac6832fcd6e938be2b9
BLAKE2b-256 b7813c4d061d767628d7b90e2ce9563a556de0eb7b7c54869a4a35a7f68c2784

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page