A CLI tool for web scraping
Project description
PyScrap Tool
pyscrap-tool
is a Python-based web scraping utility that allows users to extract data from specified web pages. It provides options to scrape specific HTML tags and presents the data in a structured format, including the ability to save results to a CSV file.
Features
- Command-Line Interface (CLI): Easily scrape data directly from the terminal using command-line arguments.
- Custom HTML Tag Scraping: Specify which HTML tag to scrape from the webpage, allowing for flexible data extraction.
- Data Output: Print scraped data to the console and save it to a CSV file for further analysis.
- Versioning: Check the version of the tool using command-line options.
Installation
Option 1: Install from PyPI
To install pyscrap-tool
directly from PyPI:
pip install pyscrap-tool
Option 2: Build from Source
For those who prefer to build it themselves:
-
Clone the repository and navigate to the project directory:
git clone https://github.com/h471x/web_scraper.git cd web_scraper
-
Build the package:
python setup.py sdist bdist_wheel
-
Install the package:
pip install dist/*.whl
Usage
Once the package is installed, you can use the pyscrap
command from the terminal. The script accepts the following command-line arguments:
-
URL:
-l
or--link
: Specify the URL of the webpage to scrape.
-
HTML Tag:
-t
or--tag
: Specify the HTML tag to scrape (e.g.,article
,div
).
-
Version:
-v
or--version
: Display the version of the tool.
Example Usage
-
Basic Scrape:
pyscrap -l https://example.com -t article
-
Display Version:
pyscrap -v
-
Help Option: For help with command-line options, use:
pyscrap -h
Development
To modify or extend the functionality, ensure you have the required dependencies installed. You can add new features to the CLI as needed.
Contributing
Feel free to fork this repository, open issues, or submit pull requests with improvements or bug fixes. Your contributions help make the PyScrap Tool
better!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyscrap-tool-0.1.tar.gz
.
File metadata
- Download URL: pyscrap-tool-0.1.tar.gz
- Upload date:
- Size: 4.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2911267633cd381db52529dadaf86b8612bc22391225a56e4d93d82b07b81e91 |
|
MD5 | 23c83a6a690c5dfdc8a12ad57c6a276a |
|
BLAKE2b-256 | f43557ee9d1dcdbece6a82a7378a1ebfaa7e6bcd9efc49af4e3087d24e9e48f0 |
File details
Details for the file pyscrap_tool-0.1-py3-none-any.whl
.
File metadata
- Download URL: pyscrap_tool-0.1-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b5a08e519eb523d1c39d2ef6eb9246a36343ad90cac8a1e0d49a44cebb40116 |
|
MD5 | fc0852b25eefb232e0ff048e9e5522e6 |
|
BLAKE2b-256 | 88fd4ecff7e5dd3320a0535a0c707dc8fa74a3de1d5c0b33dc2a270c14d290da |