Skip to main content

IP代理池

Project description

IP 代理池

安装

pip install stand

启动

stand

使用

>>> from stand import get_proxy
>>> proxy = get_proxy()
>>> print(proxy)
'103.133.222.151:8080'

在 Scrapy 中使用 stand 作为代理

import scrapy
from scrapy.crawler import CrawlerProcess


class TestSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['https://api.ip.sb/ip']

    def parse(self, response):
        print(response.meta['proxy'])
        print(response.text)


DOWNLOADER_MIDDLEWARES = {
    'stand.UserAgentMiddleware': 543,
    'stand.ProxyMiddleware': 600,
}
settings = dict(
    LOG_ENABLED=False,
    DOWNLOAD_TIMEOUT=30,
    DOWNLOADER_MIDDLEWARES=DOWNLOADER_MIDDLEWARES,
)


def run():
    process = CrawlerProcess(settings)
    process.crawl(TestSpider)
    process.start()


if __name__ == "__main__":
    run()

项目说明

  1. 当启动 stand 时, 首先会运行 crawl 函数从代理网站爬取代理 IP, 并将爬取到的结果存储在名为 stand.db (可通过 STAND_DIR 环境变量设置保存目录) 的 SQLite 数据库中, 每个 IP 有一个初始分数 2
  2. 然后会运行 validate 函数验证代理 IP 的有效性, 验证通过分数设置为最高值 3, 验证失败分数减 1, 当分数为 0 时删除该 IP
  3. 之后会定时运行 crawlvalidate 函数分别爬取和验证 IP, 每20分钟爬取一次 IP, 每60分钟验证一次 IP

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stand-0.1.11.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

stand-0.1.11-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file stand-0.1.11.tar.gz.

File metadata

  • Download URL: stand-0.1.11.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.7.5 Windows/10

File hashes

Hashes for stand-0.1.11.tar.gz
Algorithm Hash digest
SHA256 12e0ccd425e89ded5ead191c970af9720a11f41d551766cd6c2f3f1e3e1629b2
MD5 dcfd9d77c455ce1b41d31fa554271659
BLAKE2b-256 1c5dd7ab516c55f4fc03e33c661221736ab9aafada5d513f958a92cf899d76b1

See more details on using hashes here.

File details

Details for the file stand-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: stand-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.7.5 Windows/10

File hashes

Hashes for stand-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 f56d5c9ed595e973a5b5d47fb68906daaa1a7da660b28394a6797fa7ef13f7d3
MD5 c430c3f4460bec0903b61a98dcdac94a
BLAKE2b-256 55170519dc222005c75f214cadcad912c5bac82e5a61705b0b151d7c70d80fa9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page