IP代理池
Project description
IP 代理池
安装
pip install stand
启动
stand
使用
>>> from stand import get_proxy
>>> proxy = get_proxy()
>>> print(proxy)
'103.133.222.151:8080'
在 Scrapy 中使用 stand 作为代理
import scrapy
from scrapy.crawler import CrawlerProcess
class TestSpider(scrapy.Spider):
name = 'test'
start_urls = ['https://api.ip.sb/ip']
def parse(self, response):
print(response.meta['proxy'])
print(response.text)
DOWNLOADER_MIDDLEWARES = {
'stand.UserAgentMiddleware': 543,
'stand.ProxyMiddleware': 600,
}
settings = dict(
LOG_ENABLED=False,
DOWNLOAD_TIMEOUT=30,
DOWNLOADER_MIDDLEWARES=DOWNLOADER_MIDDLEWARES,
)
def run():
process = CrawlerProcess(settings)
process.crawl(TestSpider)
process.start()
if __name__ == "__main__":
run()
项目说明
- 当启动
stand
时, 首先会运行crawl
函数从代理网站爬取代理 IP, 并将爬取到的结果存储在名为 stand.db (可通过STAND_DIR
环境变量设置保存目录) 的 SQLite 数据库中, 每个 IP 有一个初始分数 2 - 然后会运行
validate
函数验证代理 IP 的有效性, 验证通过分数设置为最高值 3, 验证失败分数减 1, 当分数为 0 时删除该 IP - 之后会定时运行
crawl
和validate
函数分别爬取和验证 IP, 每20分钟爬取一次 IP, 每60分钟验证一次 IP
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
stand-0.1.11.tar.gz
(7.9 kB
view details)
Built Distribution
stand-0.1.11-py3-none-any.whl
(10.0 kB
view details)
File details
Details for the file stand-0.1.11.tar.gz
.
File metadata
- Download URL: stand-0.1.11.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.7.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12e0ccd425e89ded5ead191c970af9720a11f41d551766cd6c2f3f1e3e1629b2 |
|
MD5 | dcfd9d77c455ce1b41d31fa554271659 |
|
BLAKE2b-256 | 1c5dd7ab516c55f4fc03e33c661221736ab9aafada5d513f958a92cf899d76b1 |
File details
Details for the file stand-0.1.11-py3-none-any.whl
.
File metadata
- Download URL: stand-0.1.11-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.17 CPython/3.7.5 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f56d5c9ed595e973a5b5d47fb68906daaa1a7da660b28394a6797fa7ef13f7d3 |
|
MD5 | c430c3f4460bec0903b61a98dcdac94a |
|
BLAKE2b-256 | 55170519dc222005c75f214cadcad912c5bac82e5a61705b0b151d7c70d80fa9 |