An open-source Python library for web scraping tasks. Includes support for both image scraping and text scraping.

These details have not been verified by PyPI

Project links

Homepage

Project description

PyWebScrapr

Python Version Downloads License Compliance PyPI Version

An open-source Python library for web scraping tasks. Includes support for both text and image scraping.

Changes in 0.1.6:

Added progress indicators to both scrape_images and scrape_text to provide real-time feedback on scraping progress.
Implemented multithreading to improve performance by scraping multiple pages concurrently.
Added a rate_limit parameter to both scraping functions to control the request frequency and prevent server overload.
Refactored the concurrency model to ensure that child links are also scraped concurrently.

Changes in 0.1.5:

Added new params to both scrape_images and scrape_text to allow for following child links, and setting a maximum allowed followed child links.
Added a json export format for text scraping, with improvements to exporting.

[!TIP] We recommend disabling remove_duplicates on large sites, to allow for faster text scraping (this can improve speed by 4x). It also may not work well with follow_child_links enabled, as it may remove similar content from scraped child links.

Changes in 0.1.4:

Added new parameters to scrape_text to allow automatic removal of duplicates or similar text, and another to specify the type of textual content to scrape (text, content, unseen, links).

Changes in 0.1.3:

Added support for handling of different types of images on websites. Also now checks for invalid images, with added error handling.

Changes in 0.1.2

Changes in version 0.1.2:

min and max width and height parameters can now be specified when working with image scraping, allowing you to quickly exclude smaller resolution images, or images that are extremely large and take up too much space.
PyWebScrapr now uses BeautifulSoup4's SoupStrainer, making extracting content from webpages much faster.

Installation

You can install PyWebScrapr using pip:

pip install pywebscrapr

Supported Python Versions

PyWebScrapr supports the following Python versions:

Python 3.6
Python 3.7
Python 3.8
Python 3.9
Python 3.10
Python 3.11
Python 3.12/Later (Preferred)

Please ensure that you have one of these Python versions installed before using PyWebScrapr. PyWebScrapr may not work as expected on lower versions of Python than the supported.

Features

Text Scraping: Extract textual content from specified URLs.
Image Scraping: Download images from specified URLs.

_{*for a full list check out the PyWebScrapr official documentation.}

Usage

Text Scraping

from pywebscrapr import scrape_text

# Specify links in a file or list
links_file = 'links.txt'
links_array = ['https://example.com/page1', 'https://example.com/page2']

# Scrape text and save to the 'output.txt' file
scrape_text(links_file=links_file, links_array=links_array, output_file='output.txt')

Image Scraping

from pywebscrapr import scrape_images

# Specify links in a file or list
links_file = 'image_links.txt'
links_array = ['https://example.com/image1.jpg', 'https://example.com/image2.png']

# Scrape images and save to the 'images' folder
scrape_images(links_file=links_file, links_array=links_array, save_folder='images')

Contributing

Contributions are welcome! If you encounter any issues, have suggestions, or want to contribute to PyWebScrapr, please open an issue or submit a pull request on GitHub.

License

PyWebScrapr is released under the terms of the MIT License (Modified). Please see the LICENSE file for the full text.

Modified License Clause

The modified license clause grants users the permission to make derivative works based on the PyWebScrapr software. However, it requires any substantial changes to the software to be clearly distinguished from the original work and distributed under a different name.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.6

Nov 16, 2025

0.1.5

Feb 24, 2025

0.1.4

Dec 18, 2024

0.1.3

Dec 11, 2024

0.1.2

Sep 26, 2024

0.1.1

Jul 3, 2024

0.1.0

Feb 2, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywebscrapr-0.1.6.tar.gz (8.0 kB view details)

Uploaded Nov 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pywebscrapr-0.1.6-py3-none-any.whl (8.0 kB view details)

Uploaded Nov 16, 2025 Python 3

File details

Details for the file pywebscrapr-0.1.6.tar.gz.

File metadata

Download URL: pywebscrapr-0.1.6.tar.gz
Upload date: Nov 16, 2025
Size: 8.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pywebscrapr-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`8b11df387fafd8700aef5d80cf78677f0a9bc5f87a33a890772fbb5d516fa154`
MD5	`db9c385b1dac41c47ad07192c3d2c9be`
BLAKE2b-256	`7e53f74efbf4e3971a0d7f7156b6a428acd6d0920827e9a6b6dfdd72ba49a671`

See more details on using hashes here.

File details

Details for the file pywebscrapr-0.1.6-py3-none-any.whl.

File metadata

Download URL: pywebscrapr-0.1.6-py3-none-any.whl
Upload date: Nov 16, 2025
Size: 8.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for pywebscrapr-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`515c41b3c77eb1220a490cbf313ce4c36a60421a8f54ace564554c077ddb076b`
MD5	`f7098ecac7fd7536d01284430989ff50`
BLAKE2b-256	`461a089276a4d4f18c025a948dde579a8b8a2ba21eca4ac14ceb497731a62426`

See more details on using hashes here.

pywebscrapr 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyWebScrapr

Changes in 0.1.6:

Changes in 0.1.5:

Changes in 0.1.4:

Changes in 0.1.3:

Changes in 0.1.2

Installation

Supported Python Versions

Features

Usage

Text Scraping

Image Scraping

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes