Usefull tool for extracting text and sentences from html
Project description
pyhtmltext
pyhtmltext is a usefull and flexible tool for extracting text from html.
Help
See documentation for more details.
Installation
pip install pyhtmltext
Simple usage
from pyhtmltext import Extractor
html_string = '''<h2 class="widget-title"><span aria-hidden="true" class="icon-get-started"></span>Getting Started</h2><p>Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way writing programs with Python!</p>'''
extractor = Extractor(html=html_string)
# Extracting whole text from html with separator
extractor.extract_text()
#> "Getting Started|separator|Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way writing programs with Python!"
# Extracting sentences from html
extractor.extract_sentences()
#> ['Getting Started', "Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages.", 'The following pages are a useful first step to get on your way writing programs with Python!']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyhtmltext-0.1.tar.gz
(4.5 kB
view hashes)