Skip to main content

A Nifty HTML Parser written in Python

Project description

Pyarser is a simple, straight forward HTML parser that allows you to easily harvest text

inside an HTML document from a link to that website. Examples:

get_site_HTML(link): returns a string of HTML content from a link

get_site_text(link): returns a string of text from a link. This string has all the HTML tags <> removed, along with there contents.

search_by_phrase(phrase, link): returns the fragments of text from a link that contain the continuous string phrase.

search_for_words(words, link): returns the fragments of text from a link that contain ANY of the strings in words.

word_count(link): counts the number of text words from a link.

get_HTML_tags(link): returns a list of the tags used in an HTML document from a link.

HTML_to_TXT(link, name): writes a TXT file with the text content from a link. All HTML brackets and tags are moved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pyarser-0.1.0.tar.gz (3.0 kB view details)

Uploaded Source

File details

Details for the file Pyarser-0.1.0.tar.gz.

File metadata

  • Download URL: Pyarser-0.1.0.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for Pyarser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f81f712c91afa65776bacc2c3a4bf202b8a2b58b11f62e2baed0d615bdba5852
MD5 5f461db3628dc6de394e3746a057d720
BLAKE2b-256 121c7d8be22ce437e1a9f7340b6b39aa2d33899ed7290c31bf091b06ac6e14c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page