Skip to main content

Usefull tool for extracting text and sentences from html

Project description

pyhtmltext

pyhtmltext is a usefull and flexible tool for extracting text from html.

Help

See documentation for more details.

Installation

  pip install pyhtmltext

Simple usage

  from pyhtmltext import Extractor


  html_string = '''<h2 class="widget-title"><span aria-hidden="true" class="icon-get-started"></span>Getting Started</h2><p>Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way writing programs with Python!</p>'''

  extractor = Extractor(html=html_string)

  # Extracting whole text from html with separator
  extractor.extract_text()
  #> "Getting Started|separator|Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way writing programs with Python!"

  # Extracting sentences from html
  extractor.extract_sentences()
  #> ['Getting Started', "Python can be easy to pick up whether you're a first time programmer or you're experienced with other languages.", 'The following pages are a useful first step to get on your way writing programs with Python!']

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyhtmltext-0.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyhtmltext-0.1-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file pyhtmltext-0.1.tar.gz.

File metadata

  • Download URL: pyhtmltext-0.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for pyhtmltext-0.1.tar.gz
Algorithm Hash digest
SHA256 b7dfd1cf60227b3072a6cbd73e7d67085d7ca8022e42f95ae16a39911053927d
MD5 77f3e76aa4e016afb24ed286b04755bc
BLAKE2b-256 6a4ecbccf033f2a8b7c4f77d85dd4d14e450a4722d8c49753d79d95e6a6f8eb9

See more details on using hashes here.

File details

Details for the file pyhtmltext-0.1-py3-none-any.whl.

File metadata

  • Download URL: pyhtmltext-0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for pyhtmltext-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 18b1f4d528ca9eb4bf09d789f82cfecc82e35436961284228b5639d39e1e588d
MD5 5c1f97afc56a4e90a93d87eebcaa672a
BLAKE2b-256 085d0feba8103f73308409baba425f471f763087d281700a5384cd181bc7709d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page