Skip to main content

The first module of the Python Crash Course project

Project description

edgar_k_mod1_atsiskaitymas

The source code for the first module of the Python Crash Course project.

WebCrawling Package

Overview

The WebCrawling package provides a solution for scraping structured data from two popular Lithuanian e-commerce websites:

It allows users to extract product names, prices, discounts, and images, and save the extracted data in either .txt or .csv formats.


Features

  • Scraping: Extract product details, including names, prices, discounts, and images.
  • Multiple Formats: Save the scraped data as .txt or .csv files.
  • Image Downloading: Automatically download product images and store them in an "images" folder.
  • Pagination Support: The crawler supports scraping multiple pages of products.
  • Custom Time Limit: Users can set a time limit for the crawling process to avoid overloading the website.

Installation

You can easily install this package via PyPI.

pip install edgar_k_mod1_atsiskaitymas

or you can clone the repository and crawl pages from your local machine:

git clone https://github.com/Edarjak/edgar_k_mod1_atsiskaitymas.git
cd edgar_k_mod1_atsiskaitymas
touch main.py

Usage

After installing the package, you can use the crawl function to start scraping data. The crawl function allows you to specify the following parameters:

Parameter Description Valid Options
time_limit The time limit (in seconds) for the crawler to run. Any positive integer
source The website to scrape. "varle" - Scrape data from varle.lt
"camelia" - Scrape data from camelia.lt
return_format The format in which to save the scraped data. "txt" - Save the data in a .txt file
"csv" - Save the data in a .csv file

The example of main.py:

from edgar_k_mod1_atsiskaitymas.web_crawling import crawl

# Start scraping data from Varle.lt with a 5-second time limit and save results in CSV format
crawl(5, "varle", "csv")

# Start scraping data from Camelia.lt with a 10-second time limit and save results in TXT format
crawl(10, "camelia", "txt")

Limitations:

Package collects images only from cameliavaistine.lt

Output

After running the script, the extracted data will be saved to files in the root directory of your project. These include:

  • varle_rezultatas.txt / varle_rezultatas.csv: Data scraped from varle.lt.
  • camelia_rezultatas.txt / camelia_rezultatas.csv: Data scraped from camelia.lt.
  • images/: A directory containing the product images downloaded from the cameliavaistine.lt during scraping.

Example Data Files

You can view example output files in the /examples directory of this repository. These files contain sample scraped data and images for camelia.lt.

PyPI Link

This package is available for installation from PyPI:


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgar_k_mod1_atsiskaitymas-0.1.0.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

edgar_k_mod1_atsiskaitymas-0.1.0-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file edgar_k_mod1_atsiskaitymas-0.1.0.tar.gz.

File metadata

  • Download URL: edgar_k_mod1_atsiskaitymas-0.1.0.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/23.6.0

File hashes

Hashes for edgar_k_mod1_atsiskaitymas-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a6234e812b779c2eb3b286cf4fdf44b51793563d3652739518cfce5feba84251
MD5 54c8e1748f278e1b8bf0ed77ecf168b3
BLAKE2b-256 b282385b6b3f42ecda98216045ba727db0afe81123a5e3a81abe796e2c14f39c

See more details on using hashes here.

File details

Details for the file edgar_k_mod1_atsiskaitymas-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for edgar_k_mod1_atsiskaitymas-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d8f71058fa5d02550f974eb72ac29fcda306397aad8287f9bc05e1c6edc23cc
MD5 bc2430a5d328c54d77573cb5f2484787
BLAKE2b-256 57628bb9cd6cfeab504309d00d66c3995827bdfb97902e8e080e38e72274cb12

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page