Skip to main content

Scraper API for extracting products data from kupi.cz

Project description

Kupi API

This is a lightweight www.kupi.cz web scraper for scraping sales and recipes into JSON. It requires only bs4, json and requests libraries. API provides multiple methods to download content (recipes and discounts) from the kupi website (see more below).

There are two main parts (classes). KupiScraper for scraping data about sales and KupiRecipes for downloading recipes that are published on kupi.cz. Output of methods from this library can be further use in machine-processable tasks.

Instalation via pip

pip install kupiapi

PyPi page of the library: https://pypi.org/project/kupiapi/

GitHub page of the library: https://github.com/vorava/kupiapi

Usage

import kupiapi.scraper # imports KupiScraper() class
import kupiapi.recipes # imports KupiRecipes() class

Methods - Kupi scraper

All methods return JSON formated data if not declared elsewise. All methods have parameter max_pages, which sets how many pages of discount should be scraped. Default values is 0, that means "scrape all pages".

get_discounts_by_category(category, max_pages=0)

As parameter takes name of the category as string value. Scrapes data from this category. Discounts are scraped from url kupi.cz/slevy/category. List of main categories can be obtained by method get_categories().

get_discounts_by_search(search, max_pages=0)

Scrapes discounts by search. Search can be any string. This method searches only discounted goods (by adding tag &vse=0 in the url string).

get_discounts_by_shop(shop, max_pages=0)

Returns discounts from specific shop, defined by shop argument.

get_discounts_by_category_shop(category, shop, max_pages=0)

Combines search by shop name and by category.

get_categories()

Returns list of main categories, that can be used in method get_discounts_by_category(category, max_pages=0)

Methods - Kupi repices

All methods return JSON formatted data. Parameter full is boolean. If true full recipe info is scrapped (size of the output is significantly bigger).

get_recipes_by_category(category, full=False):

Scrapes recipes by given category (string value). Categories can be obrained by calling method get_categories().

get_all_recipes(full=False):

Scrapes all recipes available at kupi.cz.

get_recipe_by_search(search, full=False)

Gets recipe by string search.

get_recipe_detail(recipe_url):

Gets detail of recipe by url (string value). Its mandatory to provide correct url addres of recipe.

get_categories()

Returns all possible categories of recipes.

Examples

import kupiapi.recipes
kr = kupiapi.recipes.KupiRecipes()

print(kr.get_categories())

import kupiapi.scraper
import kupiapi.recipes

sc = kupiapi.scraper.KupiScraper()
rc = kupiapi.recipes.KupiRecipes()

print(sc.get_discounts_by_search('pivo'))
print(rc.get_recipes_by_category('hlavni-jidla',full=False))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kupiapi-1.0.11.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kupiapi-1.0.11-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file kupiapi-1.0.11.tar.gz.

File metadata

  • Download URL: kupiapi-1.0.11.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for kupiapi-1.0.11.tar.gz
Algorithm Hash digest
SHA256 c04ed890fc35593a21942549d7ab1ff2c4971f19df654c1b491e3f1c815fccca
MD5 08cf3848aff33f939fdb21c6eaeff51a
BLAKE2b-256 6bc4ee88a5dfe46c8255e289d1197d612ba1205a69afd14b3fc0ecc32d9cf025

See more details on using hashes here.

File details

Details for the file kupiapi-1.0.11-py3-none-any.whl.

File metadata

  • Download URL: kupiapi-1.0.11-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for kupiapi-1.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 5b37b7b644b1e2c5866761bafece3f436d0fca3b3a24d8ca56f24dab2bd295fa
MD5 52ef505edc32f999386a6f79ceb7788b
BLAKE2b-256 d1cb790abbf716ba00f4eb89778631901e718a309acac6255f518e6efafc0501

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page