Skip to main content

[Finder/Grabber/Checker] Finds public proxies on multiple sources and concurrently checks them (type, anonymity, country). Supports HTTP(S) & SOCKS

Project description

https://img.shields.io/pypi/v/proxybroker.svg https://img.shields.io/pypi/pyversions/proxybroker.svg https://img.shields.io/travis/constverum/ProxyBroker.svg https://img.shields.io/pypi/wheel/proxybroker.svg https://img.shields.io/pypi/dm/proxybroker.svg https://img.shields.io/pypi/l/proxybroker.svg

ProxyBroker is an asynchronous finder public proxies on multiple sources and concurrently checks them (wtype, anonymity, country). Supports HTTP(S) and SOCKS!

https://raw.githubusercontent.com/constverum/ProxyBroker/master/proxybroker/data/example.gif

Features

  • Finds proxies on 50+ sources (~7k working proxies)

  • Identifies proxy in raw input data

  • Checks proxies on working with protocols: HTTP, HTTPS, SOCKS4, SOCKS5

  • Checks the level of anonymity proxy

  • Removes duplicates

Installation

To install ProxyBroker, simply:

$ pip install proxybroker

Requirements

Examples

Basic example

import asyncio
from proxybroker import Broker

loop = asyncio.get_event_loop()

proxies = asyncio.Queue(loop=loop)
broker = Broker(proxies, loop=loop)

loop.run_until_complete(broker.find())

while True:
    proxy = proxies.get_nowait()
    if proxy is None: break
    print('Found proxy: %s' % proxy)

In result you get a proxy objects:

Found proxy: <Proxy AU 0.72s [HTTP: Transparent] 1.1.1.1:80>
Found proxy: <Proxy FR 0.33s [HTTP: High, HTTPS] 2.2.2.2:3128>
Found proxy: <Proxy US 1.11s [HTTP: Anonymous, HTTPS] 8.8.8.8:8000>
Found proxy: <Proxy -- 0.45s [SOCKS4, SOCKS5] 192.168.1.2:1080>
...

Advanced example

import asyncio
from proxybroker import Broker

async def use_example(pQueue):
    while True:
        proxy = await pQueue.get()
        if proxy is None:
            break
        print('Received: %s' % proxy)

async def find_advanced_example(pQueue, loop):
    broker = Broker(queue=pQueue,
                    timeout=8,
                    attempts_conn=3,
                    max_concurrent_conn=200,
                    judges=['https://httpheader.net/', 'http://httpheader.net/'],
                    providers=['http://www.proxylists.net/', 'http://fineproxy.org/eng/'],
                    verify_ssl=False,
                    loop=loop)

    # only anonymous & high levels of anonymity for http protocol and high for others:
    types = [('HTTP', ('Anonymous', 'High')), 'HTTPS', 'SOCKS4', 'SOCKS5']
    countries = ['US', 'GB', 'DE']
    limit = 10

    await broker.find(types=types, countries=countries, limit=limit)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    pQueue = asyncio.Queue(loop=loop)
    # Start searching and checking.
    # At the same time, using the received proxies to another part of the program
    tasks = asyncio.gather(find_advanced_example(pQueue, loop), use_example(pQueue))
    loop.run_until_complete(tasks)

Example with your raw data instead of providers

import asyncio
from proxybroker import Broker

loop = asyncio.get_event_loop()

proxies = asyncio.Queue(loop=loop)
broker = Broker(proxies, loop=loop)

data = '''10.0.0.1:80
          OK 10.0.0.2:   80 HTTP 200 OK 1.214
          10.0.0.3;80;SOCKS5 check date 21-01-02
          >>>10.0.0.4@80 HTTP HTTPS status OK
          ...'''

# Note: At the moment, information about the type of proxies in the raw data is ignored
loop.run_until_complete(broker.find(data=data))

found_proxies = [proxies.get_nowait() for _ in range(proxies.qsize())]

Example only collect proxies (without checking)

# ...
await broker.grab(countries=['US'], limit=100)
# ...

API

Proxy properties

Property

Type

Example

Description

host

str

‘8.8.8.8’

The IP address of the proxy

port

int

80

The port of the proxy

types

dict

{‘HTTP’: ‘Anonymous’, ‘HTTPS’: None}

The dict of supported protocols and their levels of anonymity

geo

dict

{‘code’: ‘US’, ‘name’: ‘United States’}

The dict of ISO code and the full name of the country proxy location

avgRespTime

str

‘1.11’

The string with the average response time of proxy

Broker parameters

Parameter

Required

Type

Default

Description

queue

Yes

str

Queue to which will be added proxies.

timeout

No

int

8

Timeout is set to all the actions carried by the network. In seconds.

attempts_conn

No

int

3

Limiting the maximum number of connection attempts.

max_concurrent_conn

No

int or asyncio.Semaphore()

200

Limiting the maximum number of concurrent connections (as a number, or have used in your program semaphore).

providers

No

list of strings

list of ~50 sites

The list of sites that distribute proxy lists (proxy providers).

judges

No

list of strings

list of ~10 sites

The list of sites that show http-headers (proxy judges).

verify_ssl

No

bool

False

Check ssl certifications.

loop

No

asyncio event loop

None

Event loop

Broker methods

Method

Optional parameters

Description

Parameter

Description

find

data

As a source of proxies can be specified raw data. In this case, search on the sites with a proxy does not happen. By default is empy.

Searching and checking proxies with requested parameters.

types

The list of types (protocols) which must be checked. Use a tuple if you want to specify the levels of anonymity: (Type, AnonLvl). By default, checks are enabled for all types at all levels of anonymity.

countries

List of ISO country codes, which must be located proxies.

limit

Limit the search to a definite number of working proxies.

grab

countries

List of ISO country codes, which must be located proxies.

Only searching the proxies without checking their working.

limit

Limit the search to a definite number of working proxies.

show_stats

full

If is False (by default) - will show a short version of stats (without proxieslog), if is True - show full version of stats (with proxies log).

Limiting the maximum number of connection attempts.

TODO

  • Check the ping, response time and speed of data transfer

  • Check on work with the Cookies/Referrer/POST

  • Check site access (Google, Twitter, etc)

  • Check proxy on spam. Search proxy ip in spam databases (DNSBL)

  • Information about uptime

  • Checksum of data returned

  • Support for proxy authentication

  • Finding outgoing IP for cascading proxy

  • The ability to send mail. Check on open 25 port (SMTP)

  • The ability to specify the address of the proxy without port (try to connect on defaulted ports)

  • The ability to save working proxies to a file (text/json/xml)

License

Licensed under the Apache License, Version 2.0

This product includes GeoLite2 data created by MaxMind, available from http://www.maxmind.com.

Project details


Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page