Greendeck Proxy Grabber Package
Project description
greendeck-proxygrabber 🎭
This package is developed by Greendeck
Install from pip
https://pypi.org/project/greendeck-proxygrabber/
pip install greendeck-proxygrabber
WHATS NEW?
Added proxy grabbing support of 4 new regions to proxy service, proxy grabber and proxy scraper.
👉 What is proxy service?
Proxy service is a service that keeps and updates a Mongo Database with latest up and running proxies.
👉 How to use?
import the service class
from greendeck_proxygrabber import ProxyService
service = ProxyService(MONGO_URI = 'mongodb://127.0.0.1:27017',
update_time = 300,
pool_limit = 1000,
update_count = 200,
database_name = 'proxy_pool',
collection_name_http = 'http',
collection_name_https = 'https',
country_code = 'ALL'
)
This creates a service object.
Args
- update_time = Time after which proxies will be updated (in seconds)
- pool_limit = Limit after which insertion will change to updating
- update_count = Number of proxies to request grabber at a time
- database_name = Mongo Database name to store proxies in
- collection_name_http = Collection name to store http proxies in
- collection_name_https = Collection name to store https proxies in
- country_code = ISO code of one of regions supported
List of supported regions is:
- Combined Regions: ALL
- United States: US
- Germany: DE
- Great Britain: GB
- France: FR
- Czech Republic: CZ
- Netherlands: NL
- India: IN
Starting the service
service.start()
Starting service gives the following output:
MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Service...
This will run forever and will push/update proxies in mongodb after every {update_time
} seconds.
👉 What is proxy to mongo?
Proxy to mongo is a functionality that lets you grab a set of valid proxies from the Internet and store it to the desired MongoDB database. You can schedule this to update or insert a given set of proxies to your database of pool, i.e. put it on airflow or any task scheduler.
👉 How to use?
import the ProxyToMongo class
from greendeck_proxygrabber import ProxyService
service = ProxyToMongo( MONGO_URI = MONGO_URI,
pool_limit = 1000,
length_proxy = 200,
database_name='proxy_pool',
collection_name_http='http',
collection_name_https='https',
country_code='DE'
)
This creates a service object.
Args
- pool_limit = Total number of proxies to keep in mongo/pass None if you don't want to update
- length_proxy = Number of proxies to fetch at once
- database_name = Mongo Database name to store proxies in
- collection_name_http = Collection name to store http proxies in
- collection_name_https = Collection name to store https proxies in
- country_code = ISO code of one of regions supported
List of supported regions is:
- Combined Regions: ALL
- United States: US
- Germany: DE
- Great Britain: GB
- France: FR
- Czech Republic: CZ
- Netherlands: NL
- India: IN
Calling the ProxyToMongo grabber
service.get_quick_proxy()
Starting Grabber gives the following output:
MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Grabber...
This will run forever and will push/update proxies in mongodb after every {update_time
} seconds.
👉 How to use Proxy Grabber Class?
import ProxyGrabber
class
from greendeck_proxygrabber import ProxyGrabber
initialize ProxyGrabber
object
grabber = ProxyGrabber(len_proxy_list, country_code, timeout)
Here default values of some arguments are,
len_proxy_list = 10
country_code = 'ALL'
timeout = 2
Currently the program only supports proxies of combined regions
Getting checked, running proxies
The grab_proxy grab_proxy()
function helps to fetch the proxies.
grabber.grab_proxy()
This returns a dictionary of the following structure:
{
'https': [< list of https proxies >],
'http': [< list of http proxies >],
'region': 'ALL' # default for now
}
Getting an unchecked list of proxies
The grab_proxy proxy_scraper()
method of ScrapeProxy
helps to fetch the proxies.
This returns a list of 200 proxies of both type http and https.
from greendeck_proxygrabber import ScrapeProxy
proxies_http, proxies_https = ScrapeProxy.proxy_scraper()
This returns list of proxies of type http proxies followed by https proxies.
http_proxies = [< list of http proxies >]
https_proxies = [< list of https proxies >]
Filtering invalid proxies from a list of proxies
The proxy_checker_https
and proxy_checker_http
methods from ProxyChecker
class helps to validate the proxies.
Given a list of proxies, it checks each of them to be valid or not, and returns a list of valid proxies from the proxies feeded to it.
from greendeck_proxygrabber import ProxyChecker
valid_proxies_http = ProxyChecker.proxy_checker_http(proxy_list = proxy_list_http, timeout = 2)
valid_proxies_https = ProxyChecker.proxy_checker_https(proxy_list = proxy_list_https, timeout = 2)
👉 How to build your own pip package
- open an account here https://pypi.org/
In the parent directory
python setup.py sdist bdist_wheel
twine upload dist/*
references
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file greendeck-proxygrabber-0.3.8.tar.gz
.
File metadata
- Download URL: greendeck-proxygrabber-0.3.8.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.22.0 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e7129c6157e889bf144faaed65135a6146a97fe9cbe4266f5d174ab3ba208f3 |
|
MD5 | 5c49bf868e6745651ac55863f2663776 |
|
BLAKE2b-256 | 201ecbc8df19ead8834e882182aca5e5e6298432bbacd429a14211945ef33241 |