Skip to main content

Proxy your spider and crawl the galaxy.

Project description

https://img.shields.io/aur/license/yaourt.svg https://img.shields.io/badge/python-2.7-green.svg https://img.shields.io/maintenance/yes/2017.svg

Introduction

Getting banned by websites can be really annoying. Given that the free proxy ips available on the Interent are generally hard to use. This project aims to provide a library with which spider programmers can easily fake the ip of their spider.

Features

ProxyYourSpider collets proxies from http://proxydb.net/?protocol=http&protocol=https&anonlvl=4. The number of proxies colleted will be determined by the argument user input when launching run.py. All the proxies which are non-anonymous or have a timeout more than 2 seconds will be kicked out.

run.py keeps alive and updates the proxy pool when the available proxies are less than the given threshold which dramatically enhances the stability of proxy pool.

To get a proxy, just call method fetchProxy and provide a proxy type and you will get a decent proxy which has been verified before sending to you:)

Usage

Generally, you only need to call fetchProxy method to fet a proxy you desire.

https://github.com/Rafael-Cheng/ProxyYourSpider/blob/master/usage.gif

However, you should bear in mind that run.py should be executed first and keep alive while using ProxyYourSpider.

Just execute run.py like:

$ python run.py 20

here 20 is the number of proxies you need.

https://github.com/Rafael-Cheng/ProxyYourSpider/blob/master/run_launch.png

Another thing worth mention is that run.py should be KEEP ALIVE while using ProxyYourSpide. It will continuously check whether the available proxies in your proxies pool is sufficient.

https://github.com/Rafael-Cheng/ProxyYourSpider/blob/master/run_update.png

Notice: You also need to keep Redis alive since it is our database on which we store our proxies.

https://github.com/Rafael-Cheng/ProxyYourSpider/blob/master/redis.png

Requirements

  • Python 2.7

  • Redis

  • Requests

  • BeautifulSoup

  • Works on Linux, Mac OSX, Windows, BSD

Installation

$ git clone https://github.com/Rafael-Cheng/ProxyYourSpider.git

or just download zip.

$ pip install ProxyYourSpider

License

The license of this project is GPL license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ProxyYourSpider-1.0.2.tar.gz (3.9 kB view details)

Uploaded Source

File details

Details for the file ProxyYourSpider-1.0.2.tar.gz.

File metadata

File hashes

Hashes for ProxyYourSpider-1.0.2.tar.gz
Algorithm Hash digest
SHA256 b7ec92733f746c643499d0ad51195a255396fe4c3be4445babe773ac560881a5
MD5 822a98c5e7de804be7310302936572ab
BLAKE2b-256 a21b3e9fb12420bdaecb9c51e7caac727e7ddeed62fe272f5af3daac02a7c83d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page