Skip to main content

Distributed multithreading universal crawler

Project description

Spipy - 轻量级多线程爬虫

最简爬虫实践

特点

  • 分布式
  • 通用化
  • 多线程

使用说明

用户只需要实现三个方法:

链接获取方法

def gen_url():
    for each_id in range(100):
        yield "https://www.biliob.com/api/video/{}".format(each_id)

链接解析方法

def parse(response, key=None):
    return response.xpath('//meta[@name="title"]/@content')[0]

数据导出方法

def save(item):
    print(item)

然后将这些模块组成一个Spider

s = Spider(gen_url, parse, save, name="DEMO") # 构造函数方式组装
# Or
s = Spider()
s.assemble(gen_url, parse, save, name="DEMO") # 先创建爬虫对象,再装载各个模块

接着就可以开始爬虫任务

s.run()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simpyder-0.0.1.tar.gz (3.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page