The first module of the Python Crash Course project
Project description
edgar_k_mod1_atsiskaitymas
The source code for the first module of the Python Crash Course project.
WebCrawling Package
Overview
The WebCrawling package provides a solution for scraping structured data from two popular Lithuanian e-commerce websites:
It allows users to extract product names, prices, discounts, and images, and save the extracted data in either .txt
or .csv
formats.
Features
- Scraping: Extract product details, including names, prices, discounts, and images.
- Multiple Formats: Save the scraped data as
.txt
or.csv
files. - Image Downloading: Automatically download product images and store them in an "images" folder.
- Pagination Support: The crawler supports scraping multiple pages of products.
- Custom Time Limit: Users can set a time limit for the crawling process to avoid overloading the website.
Installation
You can easily install this package via PyPI.
pip install edgar_k_mod1_atsiskaitymas
or you can clone the repository and crawl pages from your local machine:
git clone https://github.com/Edarjak/edgar_k_mod1_atsiskaitymas.git
cd edgar_k_mod1_atsiskaitymas
touch main.py
Usage
After installing the package, you can use the crawl
function to start scraping data. The crawl
function allows you to specify the following parameters:
Parameter | Description | Valid Options |
---|---|---|
time_limit | The time limit (in seconds) for the crawler to run. | Any positive integer |
source | The website to scrape. | "varle" - Scrape data from varle.lt"camelia" - Scrape data from camelia.lt |
return_format | The format in which to save the scraped data. | "txt" - Save the data in a .txt file"csv" - Save the data in a .csv file |
The example of main.py:
from edgar_k_mod1_atsiskaitymas.web_crawling import crawl
# Start scraping data from Varle.lt with a 5-second time limit and save results in CSV format
crawl(5, "varle", "csv")
# Start scraping data from Camelia.lt with a 10-second time limit and save results in TXT format
crawl(10, "camelia", "txt")
Limitations:
Package collects images only from cameliavaistine.lt
Output
After running the script, the extracted data will be saved to files in the root directory of your project. These include:
varle_rezultatas.txt
/varle_rezultatas.csv
: Data scraped from varle.lt.camelia_rezultatas.txt
/camelia_rezultatas.csv
: Data scraped from camelia.lt.images/
: A directory containing the product images downloaded from the cameliavaistine.lt during scraping.
Example Data Files
You can view example output files in the /examples directory of this repository. These files contain sample scraped data and images for camelia.lt.
PyPI Link
This package is available for installation from PyPI:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file edgar_k_mod1_atsiskaitymas-0.1.0.tar.gz
.
File metadata
- Download URL: edgar_k_mod1_atsiskaitymas-0.1.0.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
a6234e812b779c2eb3b286cf4fdf44b51793563d3652739518cfce5feba84251
|
|
MD5 |
54c8e1748f278e1b8bf0ed77ecf168b3
|
|
BLAKE2b-256 |
b282385b6b3f42ecda98216045ba727db0afe81123a5e3a81abe796e2c14f39c
|
File details
Details for the file edgar_k_mod1_atsiskaitymas-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: edgar_k_mod1_atsiskaitymas-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
2d8f71058fa5d02550f974eb72ac29fcda306397aad8287f9bc05e1c6edc23cc
|
|
MD5 |
bc2430a5d328c54d77573cb5f2484787
|
|
BLAKE2b-256 |
57628bb9cd6cfeab504309d00d66c3995827bdfb97902e8e080e38e72274cb12
|