A package to get free proxy
Project description
#get_free_proxy get_free_proxy is a tool to get free proxy from website
install
pip install get-free-proxy
usage
get_free_proxy depends on gen_browser_header.
create gen_browser_header setting
import gen_browser_header.setting.Setting as gbh_setting
import gen_browser_header.self.SelfEnum as gbh_self_enum
cur_gbh_setting = gbh_setting.GbhSetting()
cur_gbh_setting.proxy_ip = ['10.11.12.13:8080']
cur_gbh_setting.browser_type = {gbh_self_enum.BrowserType.All}
cur_gbh_setting.firefox_ver = {'min': 74, 'max': 75}
cur_gbh_setting.chrome_type = {gbh_self_enum.ChromeType.Stable}
cur_gbh_setting.chrome_max_release_year = 1
cur_gbh_setting.os_type = {gbh_self_enum.OsType.Win64}
create get_free_proxy setting
import get_free_proxy.self.SelfEnum as gfp_self_enum
import get_free_proxy.setting.Setting as gfp_setting
cur_gfp_setting = gfp_setting.GfpSetting()
cur_gfp_setting.proxy_type = {gfp_self_enum.ProxyType.HIGH_ANON}
cur_gfp_setting.protocol = {gfp_self_enum.ProtocolType.HTTP, gfp_self_enum.ProtocolType.HTTPS}
cur_gfp_setting.country = {gfp_self_enum.Country.All}
cur_gfp_setting.storage_type = {gfp_self_enum.StorageType.All}
cur_gfp_setting.mysql = { 'host': '127.0.0.1', 'port': 3306, 'user': 'root', 'pwd': '1234', 'db_name': 'db_proxy', 'tbl_name': 'tbl_proxy', 'charset': 'utf8mb4'}
cur_gfp_setting.redis = { 'host': '127.0.0.1', 'port': 6379, 'db': 0, # 0~15 'pwd': None }
cur_gfp_setting.result_file_path = os.path.join(tempfile.gettempdir(), 'result.json')
cur_gfp_setting.valid_time_in_db = 86400
cur_gfp_setting.site_max_page_no = 2
cur_gfp_setting.site = {gfp_self_enum.SupportedWeb.Xici}
start to get free proxy
mainOp = MainOp(cur_gfp_setting, cur_gbh_setting)
首先清空数据库(反正都要全部重新读取网页)
mainOp.del_proxy()
检测url是否需要使用代理
mainOp.check_if_site_need_proxy()
从可以直连的网站获得代理
tmp_proxies = mainOp.get_proxy_without_proxy()
验证代理是否可用
first_validate_proxies = mainOp.async_validate_proxies(tmp_proxies, 'https://www.baidu.com')
有可用的代理,则使用这些代理,来连接需要代理连接的代理完整;否则,使用固定的cur_gbh_setting.proxy_ip
if len(first_validate_proxies) > 0:
tmp_proxies = mainOp.get_proxy_with_proxy(proxies=first_validate_proxies)
else:
tmp_proxies = mainOp.get_proxy_with_proxy(proxies=None)
获得结果,再次进行验证,是否可以使用
second_validate_proxies = mainOp.async_validate_proxies(tmp_proxies, 'https://www.baidu.com')
合并所有可用的代理
all_validate_proxies = first_validate_proxies+second_validate_proxies
print('最终有效代理%s' % all_validate_proxies)
保存代理
mainOp.save_proxy(proxies=all_validate_proxies)
gfp_setting
- proxy_type
type: set, element is enum=>gfp_self_enum.ProxyType
default: {gfp_self_enum.ProxyType.HIGH_ANON}
description: proxy has 3 type: transparent/anonymous/high_anonymous, TRANS/ANON/HIGH_ANON. There is an addition one All, if set, will be replace by TRANS+ANON+HIGH_ANON - protocol
type: set, element is enum=>gfp_self_enum.ProtocolType default: {gfp_self_enum.ProtocolType.HTTP, gfp_self_enum.ProtocolType.HTTPS}
description: proxy protocol has 4 type: HTTP, HTTPS, SOCK4, SOCK5. There is an addition one All, is set, will be replace by HTTP+HTTPS+SOCK4+SOCK5. - country
type: set, element is enum=>gfp_self_enum.Country
default: {gfp_self_enum.Country.China}
description: some web provide proxy form all countries, the parameter will filter the country. There is an addition one All, is set, will ignore country. - storage_type
type: set, element is enum=>gfp_self_enum.StorageType
default: {gfp_self_enum.StorageType.All}
description: current support 3 storage type: Mysql/Redis/File. There is an addition one All, is set, will be replace by Mysql+Redis+File - mysql
type: dict
default:
{
'host': '127.0.0.1',
'port': 3306,
'user': 'root',
'pwd': '1234',
'db_name': 'db_proxy',
'tbl_name': 'tbl_proxy',
'charset': 'utf8mb4'
}
description: if storage_type include Mysql, set this parameter to connect mysql. - redis
type: dict default:
{
'host': '127.0.0.1',
'port': 6379,
'db': 0, # 0~15
'pwd': None
}
description: if storage_type include Redis, set this parameter to connect redis. - result_file_path
type: string
default: os.path.join(tempfile.gettempdir(), 'result.json')
description: if storage_type include File, all get proxy will be save into the file defined by result_file_path - valid_time_in_db
type: int
default: 86400
unit: second
description: since all got proxy are free, not sure when these proxy will expire. So set this parameter, it a proxy expire this duration, will not delete/not_choose - site_max_page_no
type: int
default: 2
description: min:2, max:9. The web site which provide free proxy, the content are pagationed. This parameter determine how many page will be handled to extract proxy. - site
type: set, enum=>gfp_self_enum.SupportedWeb
default: {gfp_self_enum.SupportedWeb.Xici}
description: this parameter determine which site will be used to extract proxy. currently only support 4 site: https://www.xicidaili.com, https://www.kuaidaili.com/free, https://hidemy.name/en/proxy-list/#list, https://proxy-list.org/english. and if All is set, will be replace by above 4 site.
change history
0.1.0 use requests-html replace requests
0.1.1 match gen_browser 0.1.3: when gen_header, add host base on parameter url
0.1.2 add support for zh-cn in setup.py by add encoding="utf-8"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file get_free_proxy-0.1.2.tar.gz.
File metadata
- Download URL: get_free_proxy-0.1.2.tar.gz
- Upload date:
- Size: 25.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7746898b2ed3ab5480e1d6abc9d63f1d0f634d25fcd327d9df7c91c816975498
|
|
| MD5 |
fa477d60ab6418ff62e2621924662666
|
|
| BLAKE2b-256 |
3c34ba6a83a7d8c6fd23440f645f22a0de87abd9e5ff3f0a0d69e0a84f4f9c32
|
File details
Details for the file get_free_proxy-0.1.2-py3-none-any.whl.
File metadata
- Download URL: get_free_proxy-0.1.2-py3-none-any.whl
- Upload date:
- Size: 30.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7dbb15d73d6099322affc16b4ff3ba2ad063d81f1ad4a3ad8bd5ca05abbf899
|
|
| MD5 |
30addd2bfb391a6977b6b9d7544c0630
|
|
| BLAKE2b-256 |
d2928dcf96744db30217b1f3fdc8b17dafd24026aa952fd7ce119e11eb6cee52
|