Skip to main content

Greendeck Proxy Grabber Package

Project description

greendeck-proxygrabber 🎭

Gd Logo

This package is developed by Greendeck

Install from pip

https://pypi.org/project/greendeck-proxygrabber/

pip install greendeck-proxygrabber


WHATS NEW?

Added proxy grabbing support of 4 new regions to proxy service, proxy grabber and proxy scraper.


👉 What is proxy service?

Proxy service is a service that keeps and updates a Mongo Database with latest up and running proxies.

👉 How to use?

import the service class
from greendeck_proxygrabber import ProxyService
service = ProxyService(MONGO_URI = 'mongodb://127.0.0.1:27017',
                       update_time = 300,
                       pool_limit = 1000,
                       update_count = 200,
                       database_name = 'proxy_pool',
                       collection_name_http = 'http',
                       collection_name_https = 'https',
                       country_code = 'ALL'
                       )

This creates a service object.

Args
  • update_time = Time after which proxies will be updated (in seconds)
  • pool_limit = Limit after which insertion will change to updating
  • update_count = Number of proxies to request grabber at a time
  • database_name = Mongo Database name to store proxies in
  • collection_name_http = Collection name to store http proxies in
  • collection_name_https = Collection name to store https proxies in
  • country_code = ISO code of one of regions supported

List of supported regions is:

  • Combined Regions: ALL
  • United States: US
  • Germany: DE
  • Great Britain: GB
  • France: FR
  • Czech Republic: CZ
  • Netherlands: NL
  • India: IN

Starting the service

service.start()

Starting service gives the following output:

MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Service...

This will run forever and will push/update proxies in mongodb after every {update_time} seconds.

👉 What is proxy to mongo?

Proxy to mongo is a functionality that lets you grab a set of valid proxies from the Internet and store it to the desired MongoDB database. You can schedule this to update or insert a given set of proxies to your database of pool, i.e. put it on airflow or any task scheduler.

👉 How to use?

import the ProxyToMongo class
from greendeck_proxygrabber import ProxyService
service = ProxyToMongo( MONGO_URI = MONGO_URI,
                        pool_limit = 1000,
                        length_proxy = 200,
                        database_name='proxy_pool',
                        collection_name_http='http',
                        collection_name_https='https',
                        country_code='DE'
                        )

This creates a service object.

Args
  • pool_limit = Total number of proxies to keep in mongo/pass None if you don't want to update
  • length_proxy = Number of proxies to fetch at once
  • database_name = Mongo Database name to store proxies in
  • collection_name_http = Collection name to store http proxies in
  • collection_name_https = Collection name to store https proxies in
  • country_code = ISO code of one of regions supported

List of supported regions is:

  • Combined Regions: ALL
  • United States: US
  • Germany: DE
  • Great Britain: GB
  • France: FR
  • Czech Republic: CZ
  • Netherlands: NL
  • India: IN

Calling the ProxyToMongo grabber

service.get_quick_proxy()

Starting Grabber gives the following output:

MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Grabber...

This will run forever and will push/update proxies in mongodb after every {update_time} seconds.

👉 How to use Proxy Grabber Class?

import ProxyGrabber class
from greendeck_proxygrabber import ProxyGrabber
initialize ProxyGrabber object
grabber = ProxyGrabber(len_proxy_list, country_code, timeout)

Here default values of some arguments are,

len_proxy_list = 10
country_code = 'ALL'
timeout = 2

Currently the program only supports proxies of combined regions

Getting checked, running proxies

The grab_proxy grab_proxy() function helps to fetch the proxies.

grabber.grab_proxy()

This returns a dictionary of the following structure:

{
    'https': [< list of https proxies >],
    'http': [< list of http proxies >],
    'region': 'ALL' # default for now
}
Getting an unchecked list of proxies

The grab_proxy proxy_scraper() method of ScrapeProxy helps to fetch the proxies. This returns a list of 200 proxies of both type http and https.

from greendeck_proxygrabber import ScrapeProxy
proxies_http, proxies_https = ScrapeProxy.proxy_scraper()

This returns list of proxies of type http proxies followed by https proxies.

http_proxies = [< list of http proxies >]
https_proxies = [< list of https proxies >]
Filtering invalid proxies from a list of proxies

The proxy_checker_https and proxy_checker_http methods from ProxyChecker class helps to validate the proxies.

Given a list of proxies, it checks each of them to be valid or not, and returns a list of valid proxies from the proxies feeded to it.

from greendeck_proxygrabber import ProxyChecker
valid_proxies_http = ProxyChecker.proxy_checker_http(proxy_list = proxy_list_http, timeout = 2)
valid_proxies_https = ProxyChecker.proxy_checker_https(proxy_list = proxy_list_https, timeout = 2)

👉 How to build your own pip package

In the parent directory

  • python setup.py sdist bdist_wheel
  • twine upload dist/*

references

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

greendeck-proxygrabber-0.3.8.tar.gz (9.6 kB view details)

Uploaded Source

File details

Details for the file greendeck-proxygrabber-0.3.8.tar.gz.

File metadata

  • Download URL: greendeck-proxygrabber-0.3.8.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.22.0 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for greendeck-proxygrabber-0.3.8.tar.gz
Algorithm Hash digest
SHA256 7e7129c6157e889bf144faaed65135a6146a97fe9cbe4266f5d174ab3ba208f3
MD5 5c49bf868e6745651ac55863f2663776
BLAKE2b-256 201ecbc8df19ead8834e882182aca5e5e6298432bbacd429a14211945ef33241

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page