This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

ProxyBroker is an open source tool that asynchronously finds public proxies from multiple sources and concurrently checks them (type, level of anonymity, country). Supports HTTP(S) and SOCKS.

Features

  • Gathers proxies from 50+ sources, finds ~7000 HTTP(S) and ~500 SOCKS working proxies

    Sources are the websites that publish free public proxy lists daily. And much more: you can add custom sources - websites or a raw data. Detects and recognize proxies in the text (no matter how dirty the data).

  • All protocols support

    Proxies can be check for work by HTTP, HTTPS (via CONNECT), SOCKS4 and SOCSK5 protocols.

  • Checks proxies on the level of anonymity

    Supports levels: Transparent, Anonymous, High. You can add your own judges.

  • Filter proxies by country

    Determines location (country) of the proxy and checks only the specified.

  • Is asynchronous

    That helps increase checking speed and decrease waiting time. It’s really fast: just in a minute, it will give you ~250 working HTTP proxies.

  • Automatically removes duplicate proxies.

Requirements

Installation

To install last stable release from pypi:

$ pip install proxybroker

If you use Mac OSX (without XCode) or Microsoft Windows (without Visual Studio or Windows SDK) there may be a problem with the compiling dependencies (pycares). Use this way:

$ git clone https://github.com/constverum/ProxyBroker.git
$ cd ProxyBroker
$ pip install -r requirements.txt proxybroker -f proxybroker/data/wheels

And you can install development version:

$ git clone https://github.com/constverum/ProxyBroker.git
$ cd ProxyBroker
$ python setup.py install

Examples

Basic example

import asyncio
from proxybroker import Broker

loop = asyncio.get_event_loop()

proxies = asyncio.Queue(loop=loop)
broker = Broker(proxies, loop=loop)

loop.run_until_complete(broker.find(limit=4))

while True:
    proxy = proxies.get_nowait()
    if proxy is None: break
    print('Found proxy: %s' % proxy)

As the final result, we get the Proxy objects. And we can get all the information we need through Proxy properties.

Found proxy: <Proxy AU 0.72s [HTTP: Transparent] 1.1.1.1:80>
Found proxy: <Proxy FR 0.33s [HTTP: High, HTTPS] 2.2.2.2:3128>
Found proxy: <Proxy US 1.11s [HTTP: Anonymous, HTTPS] 3.3.3.3:8000>
Found proxy: <Proxy DE 0.45s [SOCKS4, SOCKS5] 4.4.4.4:1080>

Advanced example

import asyncio
from proxybroker import Broker

async def use(proxies):
    while True:
        proxy = await proxies.get()
        if proxy is None:
            break
        elif 'SOCKS5' in proxy.types:  # filter by type
            print('Found SOCKS5 proxy: %s' % proxy)
        else:
            print('Found proxy: %s' % proxy)

async def find(proxies, loop):
    broker = Broker(queue=proxies,
                    timeout=8,
                    attempts_conn=3,
                    max_concurrent_conn=200,
                    judges=['https://httpheader.net/', 'http://httpheader.net/'],
                    providers=['http://www.proxylists.net/', 'http://fineproxy.org/eng/'],
                    verify_ssl=False,
                    loop=loop)

    # only anonymous & high levels of anonymity for http protocol and high for others:
    types = [('HTTP', ('Anonymous', 'High')), 'HTTPS', 'SOCKS4', 'SOCKS5']
    countries = ['US', 'GB', 'DE']
    limit = 10

    await broker.find(types=types, countries=countries, limit=limit)

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    proxies = asyncio.Queue(loop=loop)
    tasks = asyncio.gather(find(proxies, loop), use(proxies))
    loop.run_until_complete(tasks)

In this example, we explicitly specify the parameters that directly affect on the speed of gathering and checking proxies (see Broker parameters). In most cases it’s redundant.

Usually, we want to find:

  • a certain number of specific type of proxies
  • with a high level of anonymity
  • and from specific countries

To do this, we pass the parameters types, countries, and limit to the find method (see Broker methods).

We use two asynchronous functions that execute in parallel:

  • find() - gather proxies from the providers, check and pass them to the async queue proxies
  • use() - use the checked proxies from proxies without having to wait for the end of the gather

Note: You can start to use the checked proxies for a couple of seconds after the start of the gather. Gather and check of new proxies will continue until the limit is reached or until we not visit all the providers and check all the proxies received from them.

Example #3: find and check proxies from raw data

# raw_data.txt
10.0.0.1:80
OK 10.0.0.2:   80 HTTP 200 OK 1.214
10.0.0.3;80;SOCKS5 check date 21-01-02
>>>10.0.0.4@80 HTTP HTTPS status OK
# example.py
# ...
broker = Broker(proxies, loop=loop)

with open('raw_data.txt', 'r') as f:
    data = f.read()

await broker.find(data=data)
# ...

As a source of proxies, instead of the providers, you can use your own source data (it’s usual local .txt file). Simply pass your data to the data parameter. Note: At the moment, information about the type of proxy in the raw data is ignored.

Example #4: only gather proxies (without a check)

# ...
await broker.grab(countries=['US'], limit=100)
# ...

Use the grab method if you want only to gather proxies without a check. Note: The number of found proxies can reach over 40k.

API

Proxy properties

Property Type Example Description
host str ‘8.8.8.8’ IP address of the proxy
port int 80 Port of the proxy
types dict {‘HTTP’: ‘Anonymous’, ‘HTTPS’: None} Supported protocols and their levels of anonymity
geo dict {‘code’: ‘US’, ‘name’: ‘United States’} ISO code and the full name of the country proxy location
avgRespTime str ‘1.11’ Average response time of proxy

Broker parameters

Parameter Type [Default value] Description
queue asyncio.Queue Queue stores the checked proxies. Required
timeout int [8] Timeout is set for almost all actions carried by the network. In seconds
attempts_conn int [3] Limiting the maximum number of connection attempts
max_concurrent_conn int or asyncio.Semaphore() [200] Limiting the maximum number of concurrent connections
providers list of strings or Provider objects [~50 websites] List of the websites that publish free public proxy lists daily
judges list of strings or Judge objects [~10 websites] List of the websites that show HTTP headers and IP address
verify_ssl bool [False] Check ssl certifications
loop asyncio event loop Event loop

Broker methods

Method Optional parameters Description
Parameter Description
find data As a source of proxies can be specified your own source data. Instead of the providers Gather and check proxies with specified parameters
types List of types (protocols) which must be checked. Use a tuple if you want to specify the levels of anonymity: (Type, AnonLvl). By default: all types with any level of anonymity
countries List of ISO country codes where should be located proxies
limit Maximum number of working proxies
grab countries List of ISO country codes where should be located proxies Gather proxies without a check
limit Maximum number of working proxies
show_stats full If is False (by default) - will show a short version of stats (without proxieslog), if is True - will show full version of stats (with proxies log) Show stats of work

TODO

  • Check the ping, response time and speed of data transfer
  • Check on work with the Cookies/Referrer/POST
  • Check site access (Google, Twitter, etc) and even your own custom URL’s
  • Check proxy on spam. Search proxy ip in spam databases (DNSBL)
  • Information about uptime
  • Checksum of data returned
  • Support for proxy authentication
  • Finding outgoing IP for cascading proxy
  • The ability to send mail. Check on open 25 port (SMTP)
  • The ability to specify the address of the proxy without port (try to connect on defaulted ports)
  • The ability to save working proxies to a file (text/json/xml)

Contributing

  • Fork it: https://github.com/constverum/ProxyBroker/fork
  • Create your feature branch: git checkout -b my-new-feature
  • Commit your changes: git commit -am ‘Add some feature’
  • Push to the branch: git push origin my-new-feature
  • Submit a pull request!

License

Licensed under the Apache License, Version 2.0

This product includes GeoLite2 data created by MaxMind, available from http://www.maxmind.com.

Release History

Release History

0.1.4

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1b4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
proxybroker-0.1.4-py3-none-any.whl (1.4 MB) Copy SHA256 Checksum SHA256 py3 Wheel Apr 7, 2016
proxybroker-0.1.4.tar.gz (1.4 MB) Copy SHA256 Checksum SHA256 Source Apr 7, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting