A library to scrape text from website and web pages
Project description
A library to parse and scrape text from websites
This library will provide 3 ways to scrape the text from the website:
- The first method is to scrape all text from a single webpage.
- The second method is to scrape text from the whole website. That includes sitemaps too.
- The third method is to scrape text from the specified list. Also you could specify a target element (by CSS selector) to scrape only intended parts of webpage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
text_thief-0.0.1.tar.gz
(12.5 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file text_thief-0.0.1.tar.gz.
File metadata
- Download URL: text_thief-0.0.1.tar.gz
- Upload date:
- Size: 12.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59c6e3d3a8a95cefa1b54727ec0e0662c688e6450545174f1ce90c978495fb14
|
|
| MD5 |
b8011e19707e3b8de71e22b6c97a8f44
|
|
| BLAKE2b-256 |
9469c0252e70c5357f2fd8d6afaa5588a9db4defa2c19c0af4bca382291b1a65
|
File details
Details for the file text_thief-0.0.1-py3-none-any.whl.
File metadata
- Download URL: text_thief-0.0.1-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e3e46f495f56f84bd0ee88f97d66b13e4f9f2b88a1be61f54f67d5222451e7b
|
|
| MD5 |
3be806d0946fdf4679fb3d4dd8f1bec9
|
|
| BLAKE2b-256 |
dfbac0d66879cf650620dde5fc4e57a23e7e6f5e3cd1d13cbc60b0f92b6aeed4
|