Skip to main content

Greendeck Proxy Grabber Package

Project description

greendeck-proxygrabber ๐ŸŽญ

Gd Logo

This package is developed by Greendeck

Install from pip

https://pypi.org/project/greendeck-proxygrabber/

pip install greendeck-proxygrabber


WHATS NEW?

Added proxy grabbing support of 4 new regions to proxy service, proxy grabber and proxy scraper.


๐Ÿ‘‰ What is proxy service?

Proxy service is a service that keeps and updates a Mongo Database with latest up and running proxies.

๐Ÿ‘‰ How to use?

import the service class
from greendeck_proxygrabber import ProxyService
service = ProxyService(MONGO_URI = 'mongodb://127.0.0.1:27017',
                       update_time = 300,
                       pool_limit = 1000,
                       update_count = 200,
                       database_name = 'proxy_pool',
                       collection_name_http = 'http',
                       collection_name_https = 'https',
                       country_code = 'ALL'
                       )

This creates a service object.

Args
  • update_time = Time after which proxies will be updated (in seconds)
  • pool_limit = Limit after which insertion will change to updating
  • update_count = Number of proxies to request grabber at a time
  • database_name = Mongo Database name to store proxies in
  • collection_name_http = Collection name to store http proxies in
  • collection_name_https = Collection name to store https proxies in
  • country_code = ISO code of one of regions supported

List of supported regions is:

  • Combined Regions: ALL
  • United States: US
  • Germany: DE
  • Great Britain: GB
  • France: FR
  • Czech Republic: CZ
  • Netherlands: NL
  • India: IN

Starting the service

service.start()

Starting service gives the following output:

MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Service...

This will run forever and will push/update proxies in mongodb after every {update_time} seconds.

๐Ÿ‘‰ What is proxy to mongo?

Proxy to mongo is a functionality that lets you grab a set of valid proxies from the Internet and store it to the desired MongoDB database. You can schedule this to update or insert a given set of proxies to your database of pool, i.e. put it on airflow or any task scheduler.

๐Ÿ‘‰ How to use?

import the ProxyToMongo class
from greendeck_proxygrabber import ProxyService
service = ProxyToMongo( MONGO_URI = MONGO_URI,
                        pool_limit = 1000,
                        length_proxy = 200,
                        database_name='proxy_pool',
                        collection_name_http='http',
                        collection_name_https='https',
                        country_code='DE'
                        )

This creates a service object.

Args
  • pool_limit = Total number of proxies to keep in mongo/pass None if you don't want to update
  • length_proxy = Number of proxies to fetch at once
  • database_name = Mongo Database name to store proxies in
  • collection_name_http = Collection name to store http proxies in
  • collection_name_https = Collection name to store https proxies in
  • country_code = ISO code of one of regions supported

List of supported regions is:

  • Combined Regions: ALL
  • United States: US
  • Germany: DE
  • Great Britain: GB
  • France: FR
  • Czech Republic: CZ
  • Netherlands: NL
  • India: IN

Calling the ProxyToMongo grabber

service.get_quick_proxy()

Starting Grabber gives the following output:

MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Grabber...

This will run forever and will push/update proxies in mongodb after every {update_time} seconds.

๐Ÿ‘‰ How to use Proxy Grabber Class?

import ProxyGrabber class
from greendeck_proxygrabber import ProxyGrabber
initialize ProxyGrabber object
grabber = ProxyGrabber(len_proxy_list, country_code, timeout)

Here default values of some arguments are,

len_proxy_list = 10
country_code = 'ALL'
timeout = 2

Currently the program only supports proxies of combined regions

Getting checked, running proxies

The grab_proxy grab_proxy() function helps to fetch the proxies.

grabber.grab_proxy()

This returns a dictionary of the following structure:

{
    'https': [< list of https proxies >],
    'http': [< list of http proxies >],
    'region': 'ALL' # default for now
}
Getting an unchecked list of proxies

The grab_proxy proxy_scraper() method of ScrapeProxy helps to fetch the proxies. This returns a list of 200 proxies of both type http and https.

from greendeck_proxygrabber import ScrapeProxy
proxies_http, proxies_https = ScrapeProxy.proxy_scraper()

This returns list of proxies of type http proxies followed by https proxies.

http_proxies = [< list of http proxies >]
https_proxies = [< list of https proxies >]
Filtering invalid proxies from a list of proxies

The proxy_checker_https and proxy_checker_http methods from ProxyChecker class helps to validate the proxies.

Given a list of proxies, it checks each of them to be valid or not, and returns a list of valid proxies from the proxies feeded to it.

from greendeck_proxygrabber import ProxyChecker
valid_proxies_http = ProxyChecker.proxy_checker_http(proxy_list = proxy_list_http, timeout = 2)
valid_proxies_https = ProxyChecker.proxy_checker_https(proxy_list = proxy_list_https, timeout = 2)

๐Ÿ‘‰ How to build your own pip package

In the parent directory

  • python setup.py sdist bdist_wheel
  • twine upload dist/*

references

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for greendeck-proxygrabber, version 0.3.8
Filename, size File type Python version Upload date Hashes
Filename, size greendeck-proxygrabber-0.3.8.tar.gz (9.6 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page