Skip to main content

Easy conversion between Unicode characters, numeric HTML entities, and named HTML entities.

Project description

When reading HTML, named entities are often neater and easier to comprehend than numeric entities, Unicode (or other charset) characters, or a mixture of all of the above. Because they fall within the ASCII range, entities are also much safer to use in multiple contexts than Unicode and its various encodings (UTF-8 and such).

This module helps convert from numerical HTML entities and Unicode characters that fall outside the normal ASCII range into named entities. Or, if you prefer, it will help you go the other way, mapping all entities into Unicode. And if you decide you want entities of the counting type, it will even help you go numeric. Decimal or hexadecimal.

Usage

Python 2:

from namedentities import *

u = u'both em\u2014and–dashes…'

print "named:  ", repr(named_entities(u))
print "numeric:", repr(numeric_entities(u))
print "hex:"   ", repr(hex_entities(u))
print "unicode:", repr(unicode_entities(u))

yields:

named:   'both em—and–dashes…'
numeric: 'both em—and–dashes…'
hex:     'both em—and–dashes…'
unicode: u'both em\u2014and\u2013dashes\u2026'

You can do just about the same thing in Python 3, but you have to use a print function rather than a print statement, and prior to 3.3, you have to skip the u prefix that in Python 2 marks string literals as being Unicode literals. In Python 3.3 and following, however, you can start using the u marker again, if you like. It’s an optional feature that doesn’t do anything terribly specific, because all Python 3 strings are Unicode–but it sure helps with cross-version code compatibility. (You can use the six cross-version compatibility library, as the tests do.)

One good use for unicode_entities is to create cross-platform, cross-Python-version strings that conceptually contain Unicode characters, but spelled out as named (or numeric) HTML entities. For example:

unicode_entities('This ’thing” is great!')

This has the advantage of using only ASCII characters and common string encoding mechanisms, yet rendering full Unicode strings upon reconstitution. You can use the other functions, say named_entities(), to go from Unicode characters to named entities.

Alternate API

A new function entities(text, kind) takes text and the kind of entities you’d like returned. kind can be 'named' (the default), 'numeric', 'hex', 'unicode', or 'none'.

Recent Changes

  • As of 1.6, entities() API added. A slightly different import mechanism is used.

  • The numeric_entities(text) and hex_entities(text) APIs have been added, shifting the module’s mission from “named entities” to “general purpose entity transformation.” Live and learn!

  • The unescape(text) API changes all entities into Unicode characters. While long present, is now available for easy external consumption. It has an alias, unicode_entities(text) for parallelism with the other APIs.

  • Repackaged first as a Python package, rather than independent modules. Then, given my growing confidence in managing cross-version packages, previously separate backend implementations for Python 2 and Python 3 have been merged into a single backend.

  • Now successfully packaged for, and tests against, against Python 2.6, 2.7, 3.2, 3.3, and 3.4, as well as PyPy 2.0.2 (based on 2.7.3). Automated multi-version testing managed with the wonderful pytest and tox.

  • Should also work under Python 2.5 and PyPy 1.9, but those are not “officially supported” because they are aren’t supported in my testing environment.

Notes

  • Doesn’t attempt to encode <, >, or & (or their numerical equivalents) to avoid interfering with HTML escaping.

  • This module started as basically a packaging of Ian Beck’s work. While it’s moved slightly forward since then, it’s still mostly Ian under the covers. Thank you, Ian!

Installation

pip install -U namedentities

To easy_install under a specific Python version (3.3 in this example):

python3.3 -m easy_install --upgrade namedentities

(You may need to prefix these with “sudo “ to authorize installation.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

namedentities-1.6.2.zip (12.1 kB view details)

Uploaded Source

namedentities-1.6.2.tar.gz (5.8 kB view details)

Uploaded Source

File details

Details for the file namedentities-1.6.2.zip.

File metadata

  • Download URL: namedentities-1.6.2.zip
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for namedentities-1.6.2.zip
Algorithm Hash digest
SHA256 a8d9335879e1971c07d646b8ec6be37c9477b3afe6919eab695dd88284360d4b
MD5 1395e913f193885551d4ef9a2372567a
BLAKE2b-256 45fe075ce2f9c46ae9a88b6c70205b69dd39a09ba5b8aaad212415f7b305850f

See more details on using hashes here.

File details

Details for the file namedentities-1.6.2.tar.gz.

File metadata

File hashes

Hashes for namedentities-1.6.2.tar.gz
Algorithm Hash digest
SHA256 61828dd49a2e7aa71585459bcef8e54f17053d8f5be79f40435fd3462f20cbe4
MD5 715ce0a9f9eac51b96aa60848e9bda3c
BLAKE2b-256 4fe5c978c2f17b4bf4461d3b7b4f1480b37de193955999e70413dc5814a20dce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page