A library to scrape text from website and web pages
Project description
A library to parse and scrape text from websites
This library will provide 3 ways to scrape the text from the website:
- The first method is to scrape all text from a single webpage.
- The second method is to scrape text from the whole website. That includes sitemaps too.
- The third method is to scrape text from the specified list. Also you could specify a target element (by CSS selector) to scrape only intended parts of webpage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
text_thief-0.0.2.tar.gz
(12.5 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file text_thief-0.0.2.tar.gz.
File metadata
- Download URL: text_thief-0.0.2.tar.gz
- Upload date:
- Size: 12.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f97326a986cdf5b501f12b053b72d827cbfacf0f2d4b2b39f9dcf527bb9516e
|
|
| MD5 |
cc38be12e61c18643a903cd53e739499
|
|
| BLAKE2b-256 |
76f6af80214ebede5a40e0072d3cc1757a563ec652e702a1da2ba567476f93fd
|
File details
Details for the file text_thief-0.0.2-py3-none-any.whl.
File metadata
- Download URL: text_thief-0.0.2-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
503378ae719e0d1b841eb4b8feb829a993b9c1243d10edfac1d34eb0d6402e1f
|
|
| MD5 |
741aa62c9455df39ddbc203dcd691066
|
|
| BLAKE2b-256 |
c32e5099ae753ab8666aca5d5343c36bd495933f96d1358b0bdb3005fcad1aa9
|