Scraper API for extracting products data from kupi.cz
Project description
Kupi API
This is a lightweight www.kupi.cz web scraper for scraping sales and recipes into JSON. It requires only bs4, json and requests libraries. API provides multiple methods to download content (recipes and discounts) from the kupi website (see more below).
There are two main parts (classes). KupiScraper for scraping data about sales and KupiRecipes for downloading recipes that are published on kupi.cz. Output of methods from this library can be further use in machine-processable tasks.
Instalation via pip
pip install kupiapi
PyPi page of the library: https://pypi.org/project/kupiapi/
GitHub page of the library: https://github.com/vorava/kupiapi
Usage
import kupiapi.scraper # imports KupiScraper() class
import kupiapi.recipes # imports KupiRecipes() class
Methods - Kupi scraper
All methods return JSON formated data if not declared elsewise. All methods have parameter max_pages, which sets how many pages of discount should be scraped. Default values is 0, that means "scrape all pages".
get_discounts_by_category(category, max_pages=0)
As parameter takes name of the category as string value. Scrapes data from this category. Discounts are scraped from url kupi.cz/slevy/category. List of main categories can be obtained by method get_categories().
get_discounts_by_search(search, max_pages=0)
Scrapes discounts by search. Search can be any string. This method searches only discounted goods (by adding tag &vse=0 in the url string).
get_discounts_by_shop(shop, max_pages=0)
Returns discounts from specific shop, defined by shop argument.
get_discounts_by_category_shop(category, shop, max_pages=0)
Combines search by shop name and by category.
get_categories()
Returns list of main categories, that can be used in method get_discounts_by_category(category, max_pages=0)
Methods - Kupi repices
All methods return JSON formatted data. Parameter full is boolean. If true full recipe info is scrapped (size of the output is significantly bigger).
get_recipes_by_category(category, full=False):
Scrapes recipes by given category (string value). Categories can be obrained by calling method get_categories().
get_all_recipes(full=False):
Scrapes all recipes available at kupi.cz.
get_recipe_by_search(search, full=False)
Gets recipe by string search.
get_recipe_detail(recipe_url):
Gets detail of recipe by url (string value). Its mandatory to provide correct url addres of recipe.
get_categories()
Returns all possible categories of recipes.
Examples
import kupiapi.recipes
kr = kupiapi.recipes.KupiRecipes()
print(kr.get_categories())
import kupiapi.scraper
import kupiapi.recipes
sc = kupiapi.scraper.KupiScraper()
rc = kupiapi.recipes.KupiRecipes()
print(sc.get_discounts_by_search('pivo'))
print(rc.get_recipes_by_category('hlavni-jidla',full=False))
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kupiapi-1.0.11.tar.gz.
File metadata
- Download URL: kupiapi-1.0.11.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c04ed890fc35593a21942549d7ab1ff2c4971f19df654c1b491e3f1c815fccca
|
|
| MD5 |
08cf3848aff33f939fdb21c6eaeff51a
|
|
| BLAKE2b-256 |
6bc4ee88a5dfe46c8255e289d1197d612ba1205a69afd14b3fc0ecc32d9cf025
|
File details
Details for the file kupiapi-1.0.11-py3-none-any.whl.
File metadata
- Download URL: kupiapi-1.0.11-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5b37b7b644b1e2c5866761bafece3f436d0fca3b3a24d8ca56f24dab2bd295fa
|
|
| MD5 |
52ef505edc32f999386a6f79ceb7788b
|
|
| BLAKE2b-256 |
d1cb790abbf716ba00f4eb89778631901e718a309acac6255f518e6efafc0501
|