Skip to main content

python package that implement a scraping for israeli supermarket data

Project description

Israel Supermarket Scraper: Clients to download the data published by the supermarkets.

This is a scraper for ALL the supermarket chains listed in the GOV.IL site.

שקיפות מחירים (השוואת מחירים) - https://www.gov.il/he/departments/legalInfo/cpfta_prices_regulations

Unit & Integration Tests CodeQL Pylint Publish Docker image Upload Python Package

🤗 Want to support my work?

Buy Me A Coffee

Scheduled Automatic Testing

The test-suite is scheduled to run every three days, so you can see if the supermarket chains has chanced something in their interface and the package will not work probably.

Status: Scheduled Tests

Notice:

  • Berekt and Quik are flaky! They will not fail the testing framework, but you can still use them.
  • Some of the scrapers site are blocked to be accessed from outside of israel.

Got a question?

You can email me at erlichsefi@gmail.com

If you think you've found a bug:

  • Create issue in issue tracker to see if it's already been reported
  • Please consider solving the issue by yourself and creating a pull request.

What is il_supermarket_scarper?

There are alot of projects in github trying to scrape the supermarket data, most of them are not stable or wasn't updated for a while, it's about time there will be one codebase that does the work completely.

You only need to run the following code to get all the data currently shared by the supermarkets.

from il_supermarket_scarper import MainScrapperRunner

scraper = MainScrapperRunner()
scraper.run()

Please notice! Since new files are constantly uploaded by the supermarket to their site, you will only get the current snapshot. In order to keep geting data, you will need to run this code more the one time to get the newly uploaded files.

Quick start

il_supermarket_scarper can be installed using pip:

python3 pip install il-supermarket-scraper

If you want to run the latest version of the code, you can install from the repo directly:

python3 -m pip install -U git+https://github.com/erlichsefi/israeli-supermarket-scarpers.git
# or if you don't have 'git' installed
python3 -m pip install -U https://github.com/erlichsefi/israeli-supermarket-scarpers/master

Running Docker

The docker is designed to run the scaper every 6 hours, (you change the cron expression if you would like, checkout the file 'crontab'), in every itreation the scraper will collect the files avaliabe to download and check if the file alreay exists before fetching it, either by scaning the dump folder, or checking the mongo.

docker-compose up -d

or if you want to use the existing image from docker hub:

docker pull erlichsefi/israeli-supermarket-scarpers:latest

Contributing

Help in testing, development, documentation and other tasks is highly appreciated and useful to the project. There are tasks for contributors of all experience levels.

If you need help getting started, don't hesitate to contact me.

Development status

IL SuperMarket Scraper is beta software, as far as i see devlopment stoped until new issues will be found.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

il_supermarket_scraper-0.4.0.tar.gz (32.0 kB view details)

Uploaded Source

Built Distribution

il_supermarket_scraper-0.4.0-py3-none-any.whl (47.7 kB view details)

Uploaded Python 3

File details

Details for the file il_supermarket_scraper-0.4.0.tar.gz.

File metadata

File hashes

Hashes for il_supermarket_scraper-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e3b8bd91f0aeeff078cc129385e7b710158f1bc95dd1ba0eca896c9319c9046a
MD5 02930a73d0e5c82a355c3c3ccf98b1f9
BLAKE2b-256 286a7c73933ecc031d55801d8d694541721eb51189387c366df70071ec560418

See more details on using hashes here.

File details

Details for the file il_supermarket_scraper-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for il_supermarket_scraper-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1ec2ecbe6bbf776a8bacee97118398b8762b4c1b70f44e91f6571ccc0a17fbe
MD5 28afa92ed63e7cf1e927f1cec6946d76
BLAKE2b-256 f21222108fdce6e04bafb88c1b9288d638a53c5e38fd6f76554d7f1c9c5ab35e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page