A crawler for mindfactory.de
Project description
Mindfactory.de Crawler
This repository contains a crawler for Mindfactory, a German ecommerce shop (for computer hardware). The crawler extracts the data contained on every single product page and stores the scraped products and reviews in a SQLite database consisting of two tables.
Each product has the following properties:
- ID (SQLite identifier)
- URL
- Product name
- Brand name
- Category (i.e. CPU)
- EAN
- SKU
- Items sold (Count)
- People watching (Count)
- RMA quote (in percent)
- Average rating (from 1.0 to 5.0)
- Shipping (information on availability)
- Price (in Euro)
Additionally, for every product all reviews are collected and stored in a separate SQLite table. An entry in this table has the following properties:
- Product ID (Reference to the corresponding ID in the product table)
- Stars (Rating, from 1 to 5)
- Text
- Author
- Date (YYYY-MM-DD)
- Verified (actually bought the product at Mindfactory)
Prerequisites
- Python3
- scrapy
- SQLite3
Run the scraper
scrapy crawl mindfactory_products
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for mindfactory_crawling-1.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67a18d583b64f9609234e4bd9152d21090df3d8df17d4fb2fa3bf15b0f6ee7e5 |
|
MD5 | edbfbc7a26353fd8f26c462935614df2 |
|
BLAKE2b-256 | 5799278e355ba6e99dae949b6890c6f04fdf3218696c4f932a31b4fd66e66963 |
Close
Hashes for mindfactory_crawling-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf81026e1668d1a487cd09abd3c742626abb63d0d7702eca7607e5be1d6e6934 |
|
MD5 | 7b8a6318a28143d866bf72ea9f84fe93 |
|
BLAKE2b-256 | 814781f46033cf43c985910fa994fa3069e601d0277834876d6506c1ab61efef |