Skip to main content

A Nifty HTML Parser written in Python

Project description

Pyarser is a simple, straight forward HTML parser that allows you to easily harvest text

inside an HTML document from a link to that website. Examples:

get_site_HTML(link): returns a string of HTML content from a link

get_site_text(link): returns a string of text from a link. This string has all the HTML tags <> removed, along with there contents.

search_by_phrase(phrase, link): returns the fragments of text from a link that contain the continuous string phrase.

search_for_words(words, link): returns the fragments of text from a link that contain ANY of the strings in words.

word_count(link): counts the number of text words from a link.

get_HTML_tags(link): returns a list of the tags used in an HTML document from a link.

HTML_to_TXT(link, name): writes a TXT file with the text content from a link. All HTML brackets and tags are moved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pyarser-0.1.1.tar.gz (3.0 kB view details)

Uploaded Source

File details

Details for the file Pyarser-0.1.1.tar.gz.

File metadata

  • Download URL: Pyarser-0.1.1.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for Pyarser-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6c0c35d354d2f80da1232e2bd4d14b9317bf7e141a353738e825b3e9f3758b83
MD5 7d0f41984e81769261d272c38cdd3f55
BLAKE2b-256 a4e7ce8ed3d1403151569b62c3caf0ba9d6135dce3422e27956e757146061933

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page