A simple set of scripts for handling, formatting, and converting HTML formatted text.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

HTML-String-Tools

HTML-String-Tools is a simple set of python scripts for formatting and converting HTML text. It's mainly intended for use as a library, but there are a couple command line utilities for converting between HTML and plain text as well.

Installation
Functions
CLI

Installation

HTML-String-Tools can be downloaded from its PyPI package using pip:

pip3 install HTML-String-Tools

HTML-String-Tools only uses standard libraries included with Python3, so there are no additional required packages if building from source.

Functions

There are several functions in HTML-String-Tools you can use in your python projects.

get_extension

Returns the extension of a given URI or file

import html_string_tools

extension = html_string_tools.get_extension("path/to/resourse/image.png?1234")
# extension will be ".png"

entity_to_character

Returns a single unicode character corresponding to a given HTML escape entity.

import html_string_tools

character = html_string_tools.entity_to_character("&#38;")
# character will be "&"

character = html_string_tools.entity_to_character("&gt;")
# character will be ">"

character_to_entity

Converts a single unicode character into an HTML escape entity

import html_string_tools

entity = html_string_tools.character_to_entity("&")
# entity will be "&#38;"

replace_entities

Replaces all HTML escape entities in a given string with their corresponding unicode characters

import html_string_tools

string = html_string_tools.replace_entities("This &amp; That &#60;3")
# string will be "This & That <3"

replace_reserved_characters

Replaces all reserved HTML characters in a string with HTML escape entities

import html_string_tools

string = html_string_tools.replace_reserved_characters("This & That <3")
# string will be "This &#38; That &#60;3"

This function also has an optional escape_non_ascii bool parameter that when true, will replace ALL non-standard ASCII characters in the string with HTML escape entities, not just characters reserved for HTML.

import html_string_tools

string = html_string_tools.replace_reserved_characters("<Tést>", escape_non_ascii=True)
# string will be "&#60;T&#233;st&#62;"

replace_reserved_in_html

Attempts to replace reserved HTML characters with HTML escape entities in a string that already contains HTML syntax. This function will keep HTML tags and attributes intact, while replacing any characters within the user readable text that shouldn't contain reserved HTML characters.

import html_string_tools

string = html_string_tools.replace_reserved_in_html("<i>Text <3</i>")
# string will be "<i>Text &#60;3</i>"

Like the replace_reserved_characters function, this function also has an optional escape_non_ascii bool parameter that when true, will replace ALL non-standard ASCII characters in the string with HTML escape entities. This will NOT affect text that is part of an HTML tag or attribute.

import html_string_tools

string = html_string_tools.replace_reserved_in_html("<span id='á'>á</span>", escape_non_ascii=True)
# string will be "<span id='á'>&#225;</span>"

text_to_paragraphs

Breaks up plain text into a series of HTML paragraphs enclosed in

tags. Determines whether text on different lines should be considered part of the same paragraph based on number of new lines, indentation, and lines starting with quotes.

This includes a contains_html bool parameter that determines how to escape characters that are reserved for HTML. If False, all reserved characters in the text are escaped. If True, HTML tags and attributes are left intact while readable text is escaped. Defaults to False.

import html_string_tools

string = html_string_tools.text_to_paragraphs("Line 1\n\nLine 2")
# string will be "<p>Line 1</p><p>Line 2</p>"

html_to_text

Converts string with HTML formatting into simple plain text. HTML tags are removed, and both the tags and unreadable text inside of comments and <script> tags are removed. The text is spaced out with new lines appropriately based on how they would have been separated in the original HTML.

import html_string_tools

string = html_string_tools.html_to_text("<p>Line 1</p><p>Line 2</p>")
# string will be "Line 1\n\nLine 2"

There is also a keep_tags bool parameter that defaults to False. When True, most HTML elements are removed as normal, but images, links, and basic formatting like italic and bold tags remain intact. This is intended to drastically simplify HTML, and can be used in conjunction with text_to_paragraphs to create HTML suited for reader mode.

add_smart_quotes_to_element

Attempts to add smart quotes (separate left and right style for single and double quotes) to the text in an HTML element. All quotes used for HTML syntax are left as standard straight quotes.

import html_string_tools

string = "<div id='id'>This is a 'quote'</div>"
string = html_string_tools.add_smart_quotes_to_element(string)
# string will be "<div id='id'>This is a ‘quote’</div>"

It's recommended you don't use this function if you're trying to format an entire HTML document, as it will attempt to close quote links between separate elements. For formatting documents, the add_smart_quotes_to_paragraphs function is recommended.

add_smart_quotes_to_paragraphs

Attempts to add smart quotes (separate left and right style for single and double quotes) to all the paragraph tags (<p>) in an HTML document. Unlike add_smart_quotes_to_element, different paragraph blocks will be treated separately, so the direction of quotes will not carry over between elements. All text outside of paragraph tags will not be affected.

import html_string_tools

string = "<div>'Not Altered'</div><p>'One quote</p><p>'Two quotes'</p>"
string = html_string_tools.add_smart_quotes_to_paragraphs(string)
# string will be "<div>'Not Altered'</div><p>‘One quote</p><p>‘Two quotes’</p>"

CLI

There are two command line scripts for converting between text files and HTML files.

Text to HTML

Use the text-to-html command to convert plain text to an HTML file. Runs off of the function described above: text_to_paragraphs.

text-to-html -i input.txt -o output.html

HTML to Text

Use the html-to-text command to convert an HTML file into plain text. Runs off of the function described above: html_to_text

html-to-text -i input.htm -o output.txt

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.4.2

May 20, 2025

0.4.1

May 20, 2025

0.4.0

May 19, 2025

0.3.0

Nov 13, 2024

0.2.3

Oct 21, 2024

0.2.2

Jul 10, 2024

0.2.1

Oct 31, 2023

0.2.0

Jul 30, 2023

0.1.0

May 9, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

html_string_tools-0.4.2-py3-none-any.whl (25.7 kB view details)

Uploaded May 20, 2025 Python 3

File details

Details for the file html_string_tools-0.4.2-py3-none-any.whl.

File metadata

Download URL: html_string_tools-0.4.2-py3-none-any.whl
Upload date: May 20, 2025
Size: 25.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for html_string_tools-0.4.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6546ff76cbbed450450926fdd42e6d9bb0f5b4a4f42d1fabf71f33f263afe7b2`
MD5	`e216b583535e61e518054835b4c188b9`
BLAKE2b-256	`eb31734f7da23535ca8a240907c5369a4385a36b87821ea61e7c2bfe98a6737e`

See more details on using hashes here.

HTML-String-Tools 0.4.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HTML-String-Tools

Installation

Functions

get_extension

entity_to_character

character_to_entity

replace_entities

replace_reserved_characters

replace_reserved_in_html

text_to_paragraphs

html_to_text

add_smart_quotes_to_element

add_smart_quotes_to_paragraphs

CLI

Text to HTML

HTML to Text

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes