Skip to main content

A library for retrieving free proxies (HTTP, HTTPS, SOCKS4, SOCKS5).

Project description

A library for retrieving free proxies (HTTP, HTTPS, SOCKS4, SOCKS5). Supports Python 2.7+ and 3.4+.

NOTE: This library isn’t designed for production use. It’s advised to use your own proxies or purchase a service which provides an API. These are merely free ones that are retrieved from sites and should only be used for development or testing purposes.

import proxyscrape

collector = proxyscrape.create_collector('default', 'http')  # Create a collector for http resources
proxy = collector.get_proxy({'country': 'united states'})  # Retrieve a united states proxy

Installation

The latest version of proxyscrape is available via pip:

$ pip install proxyscrape

Alternatively, you can download and install from source:

$ python setup.py install

Provided Proxies

Current proxies provided are scraped from various sites which offer free HTTP, HTTPS, SOCKS4, and SOCKS5 proxies; and don’t require headless browsers or selenium to retrieve. The list of sites proxies retrieved are shown below.

resource

resource type

url

anonymous-proxy

http, https

https://free-proxy-list.net/anonymous-proxy.html

free-proxy-list

http, https

https://free-proxy-list.net

proxy-daily-http proxy-daily-socks4 proxy-daily-socks5

http socks4 socks5

http://www.proxy-daily.com

socks-proxy

socks4, socks5

https://www.socks-proxy.net

ssl-proxy

https

https://www.sslproxies.org

uk-proxy

http, https

https://free-proxy-list.net/uk-proxy.html

us-proxy

http, https

https://www.us-proxy.org

See Integration section for additional proxies.

Getting Started

Proxy Scrape is a library aimed at providing an efficient an easy means of retrieving proxies for web-scraping purposes. The proxies retrieved are available from sites providing free proxies. The proxies provided, as shown in the above table, can be of one of the following types (referred to as a resource type): http, https, socks4, and socks5.

Collectors

Collectors serve as the interface to retrieving proxies. They are instantiating at module-level and can be retrieved and re-used in different parts of the application (similar to the Python logging library). Collectors can be created and retrieved via the create_collector(…) and get_collector(…) functions.

from proxyscrape import create_collector, get_collector

collector = create_collector('my-collector', ['socks4', 'socks5'])

# Some other section of code
collector = get_collector('my-collector')

Each collector should have a unique name and be initialized only once. Typically, only a single collector of a given resource type should be utilized. Filters can then be applied to the proxies if specific criteria is desired.

When given one or more resources, the collector will use those to retrieve proxies. If one or more resource types are given, the resources for each of the types will be used to retrieve proxies.

Once created, proxies can be retrieved via the get_proxy(…) or the get_proxies(…) function. This optionally takes a filter_opts parameter which can filter by the following:

  • code (us, ca, …)

  • country (united states, canada, …)

  • anonymous (True, False)

  • type (http, https, socks4, socks5, …)

from proxyscrape import create_collector

collector = create_collector('my-collector', 'http')

# Retrieve any http proxy
proxy = collector.get_proxy()

# Retrieve only 'us' proxies
proxy = collector.get_proxy({'code': 'us'})

# Retrieve only anonymous 'uk' or 'us' proxies
proxy = collector.get_proxy({'code': ('us', 'uk'), 'anonymous': True})

# Retrieve all 'ca' proxies
proxies = collector.get_proxies({'code': 'ca'})

Filters can be applied to every proxy retrieval from the collector via apply_filter(…). This is useful when the same filter is expected for any proxy retrieved.

from proxyscrape import create_collector

collector = create_collector('my-collector', 'http')

# Only retrieve 'uk' and 'us' proxies
collector.apply_filter({'code': 'us'})

# Filtered proxies
proxy = collector.get_proxy()

# Clear filter
collector.clear_filter()

Note that some filters may instead use specific resources to achieve the same results (i.e. ‘us-proxy’ or ‘uk-proxy’ for ‘us’ and ‘uk’ proxies).

Blacklists can be applied to a collector to prevent specific proxies from being retrieved. They accept either one or more Proxy objects, or a host + port number combination and won’t allow retrieval of matching proxies. Proxies can be individually removed from blacklists or the entire blacklist can be cleared.

from proxyscrape import create_collector

collector = create_collector('my-collector', 'http')

# Add proxy to blacklist
collector.blacklist_proxy(Proxy('192.168.1.1', '80', None, None, None, 'http', 'my-resource'))
collector.blacklist_proxy(host='192.168.1.2', port='8080')

# Blacklisted proxies won't be included
proxy = get_proxy()

# Remove individual proxies
collector.remove_blacklist(host='192.168.1.1', port='80')

# Clear blacklist
collector.clear_blacklist()

Instead of permanently blacklisting a particular proxies, a proxy can instead be removed from internal memory. This allows it to be re-added to the pool upon a subsequent refresh.

from proxyscrape import create_collector

collector = create_collector('my-collector', 'http')

# Remove proxy from internal pool
collector.remove_proxy(Proxy('192.168.1.1', '80', None, None, 'http', 'my-resource'))

Apart from automatic refreshes when retrieving proxies, they can also be forcefully refreshed via the refresh_proxies(…) function.

from proxyscrape import create_collector

collector = create_collector('my-collector', 'http')

# Forcefully refresh
collector.refresh_proxies(force=True)

# Refresh only if proxies not refreshed within `refresh_interval`
collector.refresh_proxies(force=False)

Resources

Resources refer to a specific function that retrieves a set of proxies; the currently implemented proxies are all retrieves from scraping a particular web site.

Additional user-defined resources can be added to the pool of proxy retrieval functions via the add_resource(…) function. Resources can belong to multiple resource types.

from proxyscrape import add_resource

def func():
    return {Proxy('192.168.1.1', '80', 'us', 'united states', False, 'http', 'my-resource'), }

add_resource('my-resource', func, 'http')

As shown above, a resource doesn’t necessarily have to scrape proxies from a web site. It can be return a hard-coded list of proxies, make a call to an api, read from a file, etc.

The set of library- and user-defined resources can be retrieved via the get_resources(…) function.

from proxyscrape import get_resources
resources = get_resources()

Resource Types

Resource types are groupings of resources that can be specified when defining a collector (opposed to giving a collection of resources.

Additional user-defined resource types can be added via the add_resource_type(…) function. Resources can optionally be added to a resource type when defining it.

from proxyscrape import add_resource_type
add_resource_type('my-resource-type')
add_resource_type('my-other-resource-type', 'my-resource')  # Define resources for resource type

The set of library- and user-defined resource types can be retrieved via the get_resource_types(…) function.

from proxyscrape import get_resource_types
resources = get_resource_types()

Integration

Integrations are proxy implementations that are specific to a particular website or API and have a distinctively separate use case.

ProxyScrape

The ProxyScrape.com API provides a means of accessing thousands of proxies of various types (HTTP, SOCKS4, SOCKS5) in an efficient manner. These are vetted and validated with a minimal response time.

The get_proxyscrape_resource(…) function is used to dynamically create a new resource for using the proxyscrape API. The resource name can then be added to a resource type and used like any other library- or user-defined resource. The following parameters are used for the API:

  • proxytype (http, socks4, socks5, all)

  • timeout (milliseconds > 0)

  • ssl (yes, no, all)

  • anonymity (elite, anonymous, transparent, all)

  • country (any Alpha 2 ISO country code, all)

from proxyscrape import get_proxyscrape_resource
resource_name = get_proxyscrape_resource(proxytype='http', timeout=5000, ssl='yes', anonymity='all', country='us')

Contribution

Contributions or suggestions are welcome! Feel free to open an issue if a bug is found or an enhancement is desired, or even a pull request.

Changelog

All changes and versioning information can be found in the CHANGELOG.

License

Copyright (c) 2020 Sriram Kumar. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sriram-proxyscrape-0.4.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

sriram_proxyscrape-0.4.0-py2.py3-none-any.whl (23.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file sriram-proxyscrape-0.4.0.tar.gz.

File metadata

  • Download URL: sriram-proxyscrape-0.4.0.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for sriram-proxyscrape-0.4.0.tar.gz
Algorithm Hash digest
SHA256 8b797387c1967eb2efa4072896b166296ea0bb3f46c7b8604edc744b1e40a321
MD5 06799baf0f1172c3db1d31d2472c0655
BLAKE2b-256 2ed032fa480f1035a8272c51bc23de884eebf41cc50fb89c021eef8fae6f6f64

See more details on using hashes here.

File details

Details for the file sriram_proxyscrape-0.4.0-py2.py3-none-any.whl.

File metadata

  • Download URL: sriram_proxyscrape-0.4.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9

File hashes

Hashes for sriram_proxyscrape-0.4.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6aeda06abbab27302759bac3d09c68d6ab82d5b7ff06d9ccf707afe6dd216b51
MD5 4cbb74d47314abbb461f72a5bd4ed532
BLAKE2b-256 ddb5b8ea3180580d7f3d927c4af93eee2d7b756d43fe51176b7d033e97349b8e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page