Skip to main content

A CLI tool for web scraping

Project description

PyScrap Tool

pyscrap-tool is a Python-based web scraping utility that allows users to extract data from specified web pages. It provides options to scrape specific HTML tags and presents the data in a structured format, including the ability to save results to a CSV file.

Features

  • Command-Line Interface (CLI): Easily scrape data directly from the terminal using command-line arguments.
  • Custom HTML Tag Scraping: Specify which HTML tag to scrape from the webpage, allowing for flexible data extraction.
  • Data Output: Print scraped data to the console and save it to a CSV file for further analysis.
  • Versioning: Check the version of the tool using command-line options.

Installation

Option 1: Install from PyPI

To install pyscrap-tool directly from PyPI:

pip install pyscrap-tool

Option 2: Build from Source

For those who prefer to build it themselves:

  1. Clone the repository and navigate to the project directory:

    git clone https://github.com/h471x/web_scraper.git
    cd web_scraper
    
  2. Build the package:

    python setup.py sdist bdist_wheel
    
  3. Install the package:

    pip install dist/*.whl
    

Usage

Once the package is installed, you can use the pyscrap command from the terminal. The script accepts the following command-line arguments:

  • URL:

    • -l or --link: Specify the URL of the webpage to scrape.
  • HTML Tag:

    • -t or --tag: Specify the HTML tag to scrape (e.g., article, div).
  • Version:

    • -v or --version: Display the version of the tool.

Example Usage

  1. Basic Scrape:

    pyscrap -l https://example.com -t article
    
  2. Display Version:

    pyscrap -v
    
  3. Help Option: For help with command-line options, use:

    pyscrap -h
    

Development

To modify or extend the functionality, ensure you have the required dependencies installed. You can add new features to the CLI as needed.

Contributing

Feel free to fork this repository, open issues, or submit pull requests with improvements or bug fixes. Your contributions help make the PyScrap Tool better!

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyscrap-tool-0.1.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

pyscrap_tool-0.1-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file pyscrap-tool-0.1.tar.gz.

File metadata

  • Download URL: pyscrap-tool-0.1.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.9

File hashes

Hashes for pyscrap-tool-0.1.tar.gz
Algorithm Hash digest
SHA256 2911267633cd381db52529dadaf86b8612bc22391225a56e4d93d82b07b81e91
MD5 23c83a6a690c5dfdc8a12ad57c6a276a
BLAKE2b-256 f43557ee9d1dcdbece6a82a7378a1ebfaa7e6bcd9efc49af4e3087d24e9e48f0

See more details on using hashes here.

File details

Details for the file pyscrap_tool-0.1-py3-none-any.whl.

File metadata

  • Download URL: pyscrap_tool-0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.9

File hashes

Hashes for pyscrap_tool-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8b5a08e519eb523d1c39d2ef6eb9246a36343ad90cac8a1e0d49a44cebb40116
MD5 fc0852b25eefb232e0ff048e9e5522e6
BLAKE2b-256 88fd4ecff7e5dd3320a0535a0c707dc8fa74a3de1d5c0b33dc2a270c14d290da

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page