Usefull tool for extracting text and sentences from html
Project description
pyhtmltext
pyhtmltext is a usefull and flexible tool for extracting text from html.
Help
See documentation for more details.
Installation
pip install pyhtmltext
Simple usage
from pyhtmltext import Extractor
html_string = '''<h2 class="widget-title"><span aria-hidden="true" class="icon-get-started"></span>Getting Started</h2><p>Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way writing programs with Python!</p>'''
extractor = Extractor(html=html_string)
# Extracting whole text from html with separator
extractor.extract_text()
#> "Getting Started|separator|Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way writing programs with Python!"
# Extracting sentences from html
extractor.extract_sentences()
#> ['Getting Started', "Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages.", 'The following pages are a useful first step to get on your way writing programs with Python!']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyhtmltext-0.1.tar.gz
(4.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyhtmltext-0.1.tar.gz.
File metadata
- Download URL: pyhtmltext-0.1.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7dfd1cf60227b3072a6cbd73e7d67085d7ca8022e42f95ae16a39911053927d
|
|
| MD5 |
77f3e76aa4e016afb24ed286b04755bc
|
|
| BLAKE2b-256 |
6a4ecbccf033f2a8b7c4f77d85dd4d14e450a4722d8c49753d79d95e6a6f8eb9
|
File details
Details for the file pyhtmltext-0.1-py3-none-any.whl.
File metadata
- Download URL: pyhtmltext-0.1-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18b1f4d528ca9eb4bf09d789f82cfecc82e35436961284228b5639d39e1e588d
|
|
| MD5 |
5c1f97afc56a4e90a93d87eebcaa672a
|
|
| BLAKE2b-256 |
085d0feba8103f73308409baba425f471f763087d281700a5384cd181bc7709d
|