Skip to main content

A package to get free proxy

Project description

#get_free_proxy get_free_proxy is a tool to get free proxy from website

install

pip install get-free-proxy

usage

get_free_proxy depends on gen_browser_header.

create gen_browser_header setting

import gen_browser_header.setting.Setting as gbh_setting
import gen_browser_header.self.SelfEnum as gbh_self_enum
cur_gbh_setting = gbh_setting.GbhSetting()
cur_gbh_setting.proxy_ip = ['10.11.12.13:8080']
cur_gbh_setting.browser_type = {gbh_self_enum.BrowserType.All}
cur_gbh_setting.firefox_ver = {'min': 74, 'max': 75}
cur_gbh_setting.chrome_type = {gbh_self_enum.ChromeType.Stable}
cur_gbh_setting.chrome_max_release_year = 1
cur_gbh_setting.os_type = {gbh_self_enum.OsType.Win64}

create get_free_proxy setting

import get_free_proxy.self.SelfEnum as gfp_self_enum
import get_free_proxy.setting.Setting as gfp_setting
cur_gfp_setting = gfp_setting.GfpSetting()
cur_gfp_setting.proxy_type = {gfp_self_enum.ProxyType.HIGH_ANON}
cur_gfp_setting.protocol = {gfp_self_enum.ProtocolType.HTTP, gfp_self_enum.ProtocolType.HTTPS}
cur_gfp_setting.country = {gfp_self_enum.Country.All}
cur_gfp_setting.storage_type = {gfp_self_enum.StorageType.All}
cur_gfp_setting.mysql = { 'host': '127.0.0.1', 'port': 3306, 'user': 'root', 'pwd': '1234', 'db_name': 'db_proxy', 'tbl_name': 'tbl_proxy', 'charset': 'utf8mb4'}
cur_gfp_setting.redis = { 'host': '127.0.0.1', 'port': 6379, 'db': 0, # 0~15 'pwd': None }
cur_gfp_setting.result_file_path = os.path.join(tempfile.gettempdir(), 'result.json')
cur_gfp_setting.valid_time_in_db = 86400
cur_gfp_setting.site_max_page_no = 2
cur_gfp_setting.site = {gfp_self_enum.SupportedWeb.Xici}

start to get free proxy

mainOp = MainOp(cur_gfp_setting, cur_gbh_setting)
首先清空数据库(反正都要全部重新读取网页) mainOp.del_proxy()
检测url是否需要使用代理
mainOp.check_if_site_need_proxy()
从可以直连的网站获得代理
tmp_proxies = mainOp.get_proxy_without_proxy()
验证代理是否可用
first_validate_proxies = mainOp.async_validate_proxies(tmp_proxies, 'https://www.baidu.com')
有可用的代理,则使用这些代理,来连接需要代理连接的代理完整;否则,使用固定的cur_gbh_setting.proxy_ip
if len(first_validate_proxies) > 0:
tmp_proxies = mainOp.get_proxy_with_proxy(proxies=first_validate_proxies)
else:
tmp_proxies = mainOp.get_proxy_with_proxy(proxies=None)
获得结果,再次进行验证,是否可以使用
second_validate_proxies = mainOp.async_validate_proxies(tmp_proxies, 'https://www.baidu.com')
合并所有可用的代理
all_validate_proxies = first_validate_proxies+second_validate_proxies
print('最终有效代理%s' % all_validate_proxies)
保存代理
mainOp.save_proxy(proxies=all_validate_proxies)

gfp_setting

  1. proxy_type
    type: set, element is enum=>gfp_self_enum.ProxyType
    default: {gfp_self_enum.ProxyType.HIGH_ANON}
    description: proxy has 3 type: transparent/anonymous/high_anonymous, TRANS/ANON/HIGH_ANON. There is an addition one All, if set, will be replace by TRANS+ANON+HIGH_ANON
  2. protocol
    type: set, element is enum=>gfp_self_enum.ProtocolType default: {gfp_self_enum.ProtocolType.HTTP, gfp_self_enum.ProtocolType.HTTPS}
    description: proxy protocol has 4 type: HTTP, HTTPS, SOCK4, SOCK5. There is an addition one All, is set, will be replace by HTTP+HTTPS+SOCK4+SOCK5.
  3. country
    type: set, element is enum=>gfp_self_enum.Country
    default: {gfp_self_enum.Country.China}
    description: some web provide proxy form all countries, the parameter will filter the country. There is an addition one All, is set, will ignore country.
  4. storage_type type: set, element is enum=>gfp_self_enum.StorageType
    default: {gfp_self_enum.StorageType.All}
    description: current support 3 storage type: Mysql/Redis/File. There is an addition one All, is set, will be replace by Mysql+Redis+File
  5. mysql
    type: dict
    default:
    {
    'host': '127.0.0.1',
    'port': 3306,
    'user': 'root',
    'pwd': '1234',
    'db_name': 'db_proxy',
    'tbl_name': 'tbl_proxy',
    'charset': 'utf8mb4'
    }
    description: if storage_type include Mysql, set this parameter to connect mysql.
  6. redis
    type: dict default:
    {
    'host': '127.0.0.1',
    'port': 6379,
    'db': 0, # 0~15
    'pwd': None
    }
    description: if storage_type include Redis, set this parameter to connect redis.
  7. result_file_path
    type: string
    default: os.path.join(tempfile.gettempdir(), 'result.json')
    description: if storage_type include File, all get proxy will be save into the file defined by result_file_path
  8. valid_time_in_db
    type: int
    default: 86400
    unit: second
    description: since all got proxy are free, not sure when these proxy will expire. So set this parameter, it a proxy expire this duration, will not delete/not_choose
  9. site_max_page_no
    type: int
    default: 2
    description: min:2, max:9. The web site which provide free proxy, the content are pagationed. This parameter determine how many page will be handled to extract proxy.
  10. site
    type: set, enum=>gfp_self_enum.SupportedWeb
    default: {gfp_self_enum.SupportedWeb.Xici}
    description: this parameter determine which site will be used to extract proxy. currently only support 4 site: https://www.xicidaili.com, https://www.kuaidaili.com/free, https://hidemy.name/en/proxy-list/#list, https://proxy-list.org/english. and if All is set, will be replace by above 4 site.

change history

0.1.0 use requests-html replace requests
0.1.1 match gen_browser 0.1.3: when gen_header, add host base on parameter url 0.1.2 add support for zh-cn in setup.py by add encoding="utf-8"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

get_free_proxy-0.1.2.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

get_free_proxy-0.1.2-py3-none-any.whl (30.0 kB view details)

Uploaded Python 3

File details

Details for the file get_free_proxy-0.1.2.tar.gz.

File metadata

  • Download URL: get_free_proxy-0.1.2.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.0

File hashes

Hashes for get_free_proxy-0.1.2.tar.gz
Algorithm Hash digest
SHA256 7746898b2ed3ab5480e1d6abc9d63f1d0f634d25fcd327d9df7c91c816975498
MD5 fa477d60ab6418ff62e2621924662666
BLAKE2b-256 3c34ba6a83a7d8c6fd23440f645f22a0de87abd9e5ff3f0a0d69e0a84f4f9c32

See more details on using hashes here.

File details

Details for the file get_free_proxy-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: get_free_proxy-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 30.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.0

File hashes

Hashes for get_free_proxy-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c7dbb15d73d6099322affc16b4ff3ba2ad063d81f1ad4a3ad8bd5ca05abbf899
MD5 30addd2bfb391a6977b6b9d7544c0630
BLAKE2b-256 d2928dcf96744db30217b1f3fdc8b17dafd24026aa952fd7ce119e11eb6cee52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page