A Nifty HTML Parser written in Python
Project description
- Pyarser is a simple, straight forward HTML parser that allows you to easily harvest text
inside an HTML document from a link to that website. Examples:
get_site_HTML(link): returns a string of HTML content from a link
get_site_text(link): returns a string of text from a link. This string has all the HTML tags <> removed, along with there contents.
search_by_phrase(phrase, link): returns the fragments of text from a link that contain the continuous string phrase.
search_for_words(words, link): returns the fragments of text from a link that contain ANY of the strings in words.
word_count(link): counts the number of text words from a link.
get_HTML_tags(link): returns a list of the tags used in an HTML document from a link.
HTML_to_TXT(link, name): writes a TXT file with the text content from a link. All HTML brackets and tags are moved.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.