A scrapy project for crawl product pictures and information.
Project description
ProductCrawler
介绍
ProductCrawler是一个Scrapy项目,目的是从购物网站收集商品信息,它包含一系列爬虫,爬虫均以商品的品牌来命名。
该项目的创建是兴趣使然,目的是学习Scarpy
框架,可能存在一些Bug和莫名奇妙的代码...
正在生产环境中使用的爬虫有:
停止维护且不知道能否正常运行的爬虫有:
正在开发的爬虫有:
等待支持的品牌有:
- humanmade
- wtaps
依赖
-
Python3.7+
-
为了使用 nike 爬虫,你还需要:Chrome 浏览器和相应版本的 ChromeDriver。缺少它们不会影响其他爬虫的使用。
安装
pip install products_crawler
用法
crawl -h
usage: crawl [-h] {bearbrick,glld,kapital,nike,supreme,ts,uastore} start_urls [start_urls ...]
positional arguments:
{bearbrick,glld,kapital,nike,supreme,ts,uastore}
start_urls
optional arguments:
-h, --help show this help message and exit
试着执行下面这条命令,当前工作目录下会创建product目录,所有爬取到的商品图片和信息都会出现在里面。
crawl supreme https://www.supremecommunity.com/season/spring-summer2020/droplist/2020-02-27/
示例
Supreme
爬取某一季所有周的商品
crawl supreme https://www.supremecommunity.com/season/spring-summer2020/droplists/
爬取某一周所有的商品
crawl supreme https://www.supremecommunity.com/season/spring-summer2020/droplist/2020-02-27/
Kapital
爬取某一分类下的所有商品
crawl kapital https://www.kapital-webshop.jp/category/W_COAT/
Nike
爬取当前搜索款式的商品(包括所有颜色)
crawl nike https://www.nike.com/cn/w?q=CU6525&vst=CU6525
BearBrick
爬取当前分类的所有商品
crawl bearbrick http://www.bearbrick.com/product/12_0
已知问题:BearBrickLoader的category_in无法达到预期的行为。
United Arrows Online Shop
爬取当前商品
crawl uastore https://store.united-arrows.co.jp/shop/mt/goods.html?gid=52711245
Travis Scott
爬取所有商品
crawl ts https://shop.travisscott.com/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
products_crawler-0.1.9.tar.gz
(13.0 kB
view details)
Built Distribution
File details
Details for the file products_crawler-0.1.9.tar.gz
.
File metadata
- Download URL: products_crawler-0.1.9.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae926c06ffc91b786b7e0edeb94c11324060c88198afd82f534325df3426a356 |
|
MD5 | 0b7d6e79e73346400b860c1a1aa4944f |
|
BLAKE2b-256 | e9f71e864b79da4f74896593e0577d576498c900f3ab8c44b36f33e169b2b149 |
File details
Details for the file products_crawler-0.1.9-py3-none-any.whl
.
File metadata
- Download URL: products_crawler-0.1.9-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 839f9ba1d861ed9a3fb19de89d61980d2534722ca91fde61aa1fb3f10403dcce |
|
MD5 | 0d396c03d6afe988b6f050b8488199c5 |
|
BLAKE2b-256 | 7e484908c5444229422183ecdd90fab172c98b380069eff56ffdc30b06931ad8 |