Skip to main content

A tool that converts arbitrary text (like user input or file names) into valid Python identifiers while preserving as much of the original meaning as possible.

Project description

A tool that converts arbitrary text (like user input or file names) into valid Python identifiers while preserving as much of the original meaning as possible.

Example Usage

from __future__ import print_function
from unicode_string_to_identifier import unicode_string_to_identifier

print(unicode_string_to_identifier(u""))
# u"_"
print(unicode_string_to_identifier(u" \r\n\t"))
# u"_"
print(unicode_string_to_identifier(u"123abc"))
# u"_123abc"
print(unicode_string_to_identifier(u"&abc 123"))
# u"_abc_123"
print(unicode_string_to_identifier(u"  hello  world  $"))
# u"_hello_world__"
print(unicode_string_to_identifier(u"测试@unicode"))
# u"测试_unicode"

How It Works

First, get_character_type() categorizes each Unicode character into one of four types:

  • LETTER_OR_UNDERSCORE (Unicode category starting with 'L' or underscore _)
  • DECIMAL_DIGIT (Unicode category 'Nd')
  • SPACE_OR_CONTROL (Unicode categories starting with 'Z' or 'Cc')
  • OTHER (all other characters)

Then, it implements the following conversion rules using a state machine:

  • The first character must be a letter/underscore. If the first character is a digit, prepend _ to make it valid.
  • Subsequent valid characters (letters/underscores/digits) are kept as-is.
  • Other characters are replaced with underscores, but whitespace/control character sequences are collapsed into single underscores.
  • Ensures the output is non-empty (appends _ if empty).

Contributing

Contributions are welcome! Please submit pull requests or open issues on the GitHub repository.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicode_string_to_identifier-0.1.0a0.tar.gz (2.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unicode_string_to_identifier-0.1.0a0-py2.py3-none-any.whl (2.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file unicode_string_to_identifier-0.1.0a0.tar.gz.

File metadata

File hashes

Hashes for unicode_string_to_identifier-0.1.0a0.tar.gz
Algorithm Hash digest
SHA256 96410db15da4964bbbb18fb5a2e691abc7894b19ee65aa379e176df01c7774e4
MD5 93c513217c7826aa40af4be22e2fc05a
BLAKE2b-256 b47964dd7f91870212f1e2f24059910523b26087790e12bd2b8177efcf870be9

See more details on using hashes here.

File details

Details for the file unicode_string_to_identifier-0.1.0a0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for unicode_string_to_identifier-0.1.0a0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ce8f52160a8ad5ea7a221a98d57e256a8a0b89c4b44088a4d9ae17cb61eb3277
MD5 6b39959141d96e45a9fd9c574290ccd4
BLAKE2b-256 68995ca36ac028c38fb37b6bdd65e6ccb2d7ce9b915507039394b7aafecd8d75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page