Skip to main content

A scrapy project for crawl product pictures and information.

Project description

ProductCrawler

Version

介绍

ProductCrawler是一个Scrapy项目,目的是从购物网站收集商品信息,它包含一系列爬虫,爬虫均以商品的品牌来命名。

该项目的创建是兴趣使然,目的是学习Scarpy框架,可能存在一些Bug和莫名奇妙的代码...

正在生产环境中使用的爬虫有:

停止维护且不知道能否正常运行的爬虫有:

正在开发的爬虫有:

等待支持的品牌有:

  • humanmade
  • wtaps

依赖

  • Python3.7+

  • 为了使用 nike 爬虫,你还需要:Chrome 浏览器和相应版本的 ChromeDriver。缺少它们不会影响其他爬虫的使用。

安装

pip install products_crawler

用法

crawl -h
usage: crawl [-h] {bearbrick,glld,kapital,nike,supreme,ts,uastore} start_urls [start_urls ...]

positional arguments:
  {bearbrick,glld,kapital,nike,supreme,ts,uastore}
  start_urls

optional arguments:
  -h, --help            show this help message and exit

试着执行下面这条命令,当前工作目录下会创建product目录,所有爬取到的商品图片和信息都会出现在里面。

crawl supreme https://www.supremecommunity.com/season/spring-summer2020/droplist/2020-02-27/

示例

Supreme

爬取某一季所有周的商品

crawl supreme https://www.supremecommunity.com/season/spring-summer2020/droplists/

爬取某一周所有的商品

crawl supreme https://www.supremecommunity.com/season/spring-summer2020/droplist/2020-02-27/

Kapital

爬取某一分类下的所有商品

crawl kapital https://www.kapital-webshop.jp/category/W_COAT/

Nike

爬取当前搜索款式的商品(包括所有颜色)

crawl nike https://www.nike.com/cn/w?q=CU6525&vst=CU6525

BearBrick

爬取当前分类的所有商品

crawl bearbrick http://www.bearbrick.com/product/12_0

已知问题:BearBrickLoader的category_in无法达到预期的行为。

United Arrows Online Shop

爬取当前商品

crawl uastore https://store.united-arrows.co.jp/shop/mt/goods.html?gid=52711245

Travis Scott

爬取所有商品

crawl ts https://shop.travisscott.com/ 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

products_crawler-0.1.9.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

products_crawler-0.1.9-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file products_crawler-0.1.9.tar.gz.

File metadata

  • Download URL: products_crawler-0.1.9.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for products_crawler-0.1.9.tar.gz
Algorithm Hash digest
SHA256 ae926c06ffc91b786b7e0edeb94c11324060c88198afd82f534325df3426a356
MD5 0b7d6e79e73346400b860c1a1aa4944f
BLAKE2b-256 e9f71e864b79da4f74896593e0577d576498c900f3ab8c44b36f33e169b2b149

See more details on using hashes here.

File details

Details for the file products_crawler-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for products_crawler-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 839f9ba1d861ed9a3fb19de89d61980d2534722ca91fde61aa1fb3f10403dcce
MD5 0d396c03d6afe988b6f050b8488199c5
BLAKE2b-256 7e484908c5444229422183ecdd90fab172c98b380069eff56ffdc30b06931ad8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page