The first module of the Python Crash Course project

These details have not been verified by PyPI

Project description

edgar_k_mod1_atsiskaitymas

The source code for the first module of the Python Crash Course project.

WebCrawling Package

Overview

The WebCrawling package provides a solution for scraping structured data from two popular Lithuanian e-commerce websites:

It allows users to extract product names, prices, discounts, and images, and save the extracted data in either .txt or .csv formats.

Features

Scraping: Extract product details, including names, prices, discounts, and images.
Multiple Formats: Save the scraped data as .txt or .csv files.
Image Downloading: Automatically download product images and store them in an "images" folder.
Pagination Support: The crawler supports scraping multiple pages of products.
Custom Time Limit: Users can set a time limit for the crawling process to avoid overloading the website.

Installation

You can easily install this package via PyPI.

pip install edgar_k_mod1_atsiskaitymas

or you can clone the repository and crawl pages from your local machine:

git clone https://github.com/Edarjak/edgar_k_mod1_atsiskaitymas.git
cd edgar_k_mod1_atsiskaitymas
touch main.py

Usage

After installing the package, you can use the crawl function to start scraping data. The crawl function allows you to specify the following parameters:

Parameter	Description	Valid Options
time_limit	The time limit (in seconds) for the crawler to run.	Any positive integer
source	The website to scrape.	`"varle"` - Scrape data from varle.lt `"camelia"` - Scrape data from camelia.lt
return_format	The format in which to save the scraped data.	`"txt"` - Save the data in a .txt file `"csv"` - Save the data in a .csv file

The example of main.py:

from edgar_k_mod1_atsiskaitymas.web_crawling import crawl

# Start scraping data from Varle.lt with a 5-second time limit and save results in CSV format
crawl(5, "varle", "csv")

# Start scraping data from Camelia.lt with a 10-second time limit and save results in TXT format
crawl(10, "camelia", "txt")

Limitations:

Package collects images only from cameliavaistine.lt

Output

After running the script, the extracted data will be saved to files in the root directory of your project. These include:

varle_rezultatas.txt / varle_rezultatas.csv: Data scraped from varle.lt.
camelia_rezultatas.txt / camelia_rezultatas.csv: Data scraped from camelia.lt.
images/: A directory containing the product images downloaded from the cameliavaistine.lt during scraping.

Example Data Files

You can view example output files in the /examples directory of this repository. These files contain sample scraped data and images for camelia.lt.

PyPI Link

This package is available for installation from PyPI:

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Nov 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgar_k_mod1_atsiskaitymas-0.1.0.tar.gz (4.0 kB view details)

Uploaded Nov 25, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

edgar_k_mod1_atsiskaitymas-0.1.0-py3-none-any.whl (4.7 kB view details)

Uploaded Nov 25, 2024 Python 3

File details

Details for the file edgar_k_mod1_atsiskaitymas-0.1.0.tar.gz.

File metadata

Download URL: edgar_k_mod1_atsiskaitymas-0.1.0.tar.gz
Upload date: Nov 25, 2024
Size: 4.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/23.6.0

File hashes

Hashes for edgar_k_mod1_atsiskaitymas-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a6234e812b779c2eb3b286cf4fdf44b51793563d3652739518cfce5feba84251`
MD5	`54c8e1748f278e1b8bf0ed77ecf168b3`
BLAKE2b-256	`b282385b6b3f42ecda98216045ba727db0afe81123a5e3a81abe796e2c14f39c`

See more details on using hashes here.

File details

Details for the file edgar_k_mod1_atsiskaitymas-0.1.0-py3-none-any.whl.

File metadata

Download URL: edgar_k_mod1_atsiskaitymas-0.1.0-py3-none-any.whl
Upload date: Nov 25, 2024
Size: 4.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.4 CPython/3.13.0 Darwin/23.6.0

File hashes

Hashes for edgar_k_mod1_atsiskaitymas-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d8f71058fa5d02550f974eb72ac29fcda306397aad8287f9bc05e1c6edc23cc`
MD5	`bc2430a5d328c54d77573cb5f2484787`
BLAKE2b-256	`57628bb9cd6cfeab504309d00d66c3995827bdfb97902e8e080e38e72274cb12`

See more details on using hashes here.

edgar_k_mod1_atsiskaitymas 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

edgar_k_mod1_atsiskaitymas

WebCrawling Package

Overview

Features

Installation

Usage

Limitations:

Output

Example Data Files

PyPI Link

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes