IP代理池
Project description
IP 代理池
安装
pip install stand
启动
stand
使用
>>> from stand import get_proxy >>> proxy = get_proxy() >>> print(proxy) '103.133.222.151:8080'
在 Scrapy 中使用 stand 作为代理
import scrapy from scrapy.crawler import CrawlerProcess class TestSpider(scrapy.Spider): name = 'test' start_urls = ['https://api.ip.sb/ip'] def parse(self, response): print(response.meta['proxy']) print(response.text) DOWNLOADER_MIDDLEWARES = { 'stand.UserAgentMiddleware': 543, 'stand.ProxyMiddleware': 600, } settings = dict( LOG_ENABLED=False, DOWNLOAD_TIMEOUT=30, DOWNLOADER_MIDDLEWARES=DOWNLOADER_MIDDLEWARES, ) def run(): process = CrawlerProcess(settings) process.crawl(TestSpider) process.start() if __name__ == "__main__": run()
项目说明
- 当启动
stand
时, 首先会运行crawl
函数从代理网站爬取代理 IP, 并将爬取到的结果存储在名为 stand.db (可通过STAND_DIR
环境变量设置保存目录) 的 SQLite 数据库中, 每个 IP 有一个初始分数 2 - 然后会运行
validate
函数验证代理 IP 的有效性, 验证通过分数设置为最高值 3, 验证失败分数减 1, 当分数为 0 时删除该 IP - 之后会定时运行
crawl
和validate
函数分别爬取和验证 IP, 每20分钟爬取一次 IP, 每60分钟验证一次 IP
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size stand-0.1.11-py3-none-any.whl (10.0 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size stand-0.1.11.tar.gz (7.9 kB) | File type Source | Python version None | Upload date | Hashes View |