Skip to main content

Provides pure ASCII transliterations of Unicode strings.

Project description

Python Fast Unidecode

Build Python version Tests License: MIT


This repo is a fork of the rust-unicode repository and transports the original Rust implementation to be used with Python. It also implements a couple of source code changes to hasten a translation of ASCII family of characters and makes this implementation on par with Python unidecode implementation on this set of characters.

The overall result is this package should provide you with the same output as the aforementioned Python implementation. However, this package is much faster on a translation of non-ASCII characters (>~3x) and comparable to slightly slowe on ASCII characters (in a degree of small percents) on average based on the benchmark/speed_benchmark.py benchmark (depending on caching, etc.; sometimes, a translation of non-ASCII characters provides you with a speedup of up to >10x). The benchmarks were run on Python 3.13.

License

This project is licensed under the MIT License.

Important Note: Unlike the original Python unidecode package, which is distributed under the restrictive GNU General Public License (GPL), fast-unidecode is released under the permissive MIT license. This makes it suitable for use in a wider range of projects, including commercial and closed-source applications. For SaaS (Software as a Service) companies, using a GPL-licensed library can create an obligation to release your own source code, a requirement that the MIT license does not have.

Benchmark code is not a part of the distributed package.

Installation

pip install fast_unidecode
Installation from source

First, you need to build the package using maturin, then install fast_unidecode simply with pip.

maturin build --release
pip install target/wheels/fast_unidecode...

Usage

>>> from fast_unidecode import unidecode

>>> print(unidecode("Æneid"))
'AEneid'

>>> print(unidecode("北亰"))
'Bei Jing'
rust-unidecode (Original README.md)

Documentation

The rust-unidecode library is a Rust port of Sean M. Burke's famous Text::Unidecode module for Perl. It transliterates Unicode strings such as "Æneid" into pure ASCII ones such as "AEneid." For a detailed explanation on the rationale behind using such a library, you can refer to both the documentation of the original module and this article written by Burke in 2001.

The data set used to translate the Unicode was ported directly from the Text::Unidecode module using a Perl script, so rust-unidecode should produce identical output.

Examples

extern crate unidecode;
use unidecode::unidecode;

assert_eq!(unidecode("Æneid"), "AEneid");
assert_eq!(unidecode("étude"), "etude");
assert_eq!(unidecode("北亰"), "Bei Jing");
assert_eq!(unidecode("ᔕᓇᓇ"), "shanana");
assert_eq!(unidecode("げんまい茶"), "genmaiCha ");

Guarantees and Warnings

Here are some guarantees you have when calling unidecode():

  • The String returned will be valid ASCII; the decimal representation of every char in the string will be between 0 and 127, inclusive.
  • Every ASCII character (0x0000 - 0x007F) is mapped to itself.
  • All Unicode characters will translate to a string containing newlines ("\n") or ASCII characters in the range 0x0020 - 0x007E. So for example, no Unicode character will translate to \u{01}. The exception is if the ASCII character itself is passed in, in which case it will be mapped to itself. (So '\u{01}' will be mapped to "\u{01}".)

There are, however, some things you should keep in mind:

  • As stated, some transliterations do produce \n characters.
  • Some Unicode characters transliterate to an empty string, either on purpose or because rust-unidecode does not know about the character.
  • Some Unicode characters are unknown and transliterate to "[?]".
  • Many Unicode characters transliterate to multi-character strings. For example, 北 is transliterated as "Bei ".

This information was paraphrased from the original Text::Unidecode documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_unidecode-1.0.3.post1-cp314-cp314-manylinux_2_28_x86_64.whl (494.5 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.3.post1-cp314-cp314-manylinux_2_28_aarch64.whl (570.1 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.28+ ARM64

fast_unidecode-1.0.3.post1-cp314-cp314-macosx_11_0_arm64.whl (324.0 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

fast_unidecode-1.0.3.post1-cp313-cp313-manylinux_2_28_x86_64.whl (494.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.3.post1-cp313-cp313-manylinux_2_28_aarch64.whl (569.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ ARM64

fast_unidecode-1.0.3.post1-cp313-cp313-macosx_11_0_arm64.whl (324.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

fast_unidecode-1.0.3.post1-cp312-cp312-manylinux_2_28_x86_64.whl (495.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.3.post1-cp312-cp312-manylinux_2_28_aarch64.whl (569.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ ARM64

fast_unidecode-1.0.3.post1-cp312-cp312-macosx_11_0_arm64.whl (323.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

fast_unidecode-1.0.3.post1-cp311-cp311-manylinux_2_28_x86_64.whl (495.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

fast_unidecode-1.0.3.post1-cp311-cp311-manylinux_2_28_aarch64.whl (570.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ ARM64

fast_unidecode-1.0.3.post1-cp311-cp311-macosx_11_0_arm64.whl (324.9 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file fast_unidecode-1.0.3.post1-cp314-cp314-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp314-cp314-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f9637691cfe605474053280827692714d52f7f037c27e8225f7080bbcafb2046
MD5 de9b2a52fd1d48fa9bb3149ffbb3f4cd
BLAKE2b-256 363052b00198751b92259ba350e3e792abca927d05c41751c10325e47311770a

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp314-cp314-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp314-cp314-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0006541e9b24318803e2c11c224c7698f3a2332864212524ede422192668d4ac
MD5 f21c1ca1b73cb6959f7857dd94bbb9d5
BLAKE2b-256 283802a329a6b42abc559d3e2fe4a386e305f8190b7010e8897b472072296534

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 15a3c70d4e55422aeaac5772c58f1aa1565c21ff40a278802e55828c617549b8
MD5 86051f879b7881bec3472e08448e5120
BLAKE2b-256 dda66ca4175e29b7d3ce1903076b6e9ffe5bd67be845034558389fd3833af82d

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 fd7ced585b10e562727e9a22c9e9c3574f9e17fc3636119e5e7a5b7c2af35221
MD5 bbfdbd6ec40ebd8b8b7182a98a59c440
BLAKE2b-256 1d393928be0c805fbbb921953f55eb8ec04b5c7a80521a69966ed85ebf79555b

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp313-cp313-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp313-cp313-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 566829712084383157c61951282b70c1010d705d64aca7f986f8eaefc9c6cd68
MD5 25718a6e77a6330d5f3320aaf403fde0
BLAKE2b-256 19f08407ee41d90ca5f63decbe0bd465e8511f0053d72d82260393bd349eb080

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c8174ec63be4fe8cf1f5f5fdeb08ec4777e9de42b83bf0513f0b4b27f45e102a
MD5 3b59de9794d13beddc29503a42f711fc
BLAKE2b-256 53ae5a415e21b0b17483714b5db6c42cd18588553c81ada7ab51fb243109dddd

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 370c81c03bd82cb6702643b0841413b51aa8ba7a1e8fa8994ee4644fbeb11bab
MD5 173f4815856264b044da5ea2979f3d97
BLAKE2b-256 cb0104cc234108e0ddaf5d22d3c47ec0f28ca254e522503db0786fb75f26f3c4

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp312-cp312-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp312-cp312-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 186090d406c83c33bad18f7c25b05240bdac3c320793d87a4d2dcc468091bf63
MD5 6320e78b9b5ae0f87d78c9d23e8a65a3
BLAKE2b-256 f29126f9be6465aad4231bf657ebd2e502b7a980e9f8f0653abf39e8de526a9c

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bcf112b8ec2e83012528c2d14658f06dd8ac6344efdef42eccb1b0bdc8bf3dee
MD5 6f331d80b6c74d582f75d90cbc858040
BLAKE2b-256 9539bc804bfdb4de77802825ba04e18c53184e8e40271ed945691cc928f8512b

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 17c1fd9dbed18894e9f8e491e79bbf69215e371177e07392a24f48fac3a52277
MD5 1340daf5904ec0bf8d8fc991e36f4815
BLAKE2b-256 6ab8d61761869a42dd2aef04686a0124f436afcf264cae983361c000dd9e76a4

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp311-cp311-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d28e46ab193b8bbffb607ca3d4b0c5a55dc63a6882ef55b16bb70e6ca102ee69
MD5 16da4c6185423657e10ca4353910a871
BLAKE2b-256 ced539885ff8ad2536dfe7689b672fcff3f4d98d03f5aeed83eb93b6028195dc

See more details on using hashes here.

File details

Details for the file fast_unidecode-1.0.3.post1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fast_unidecode-1.0.3.post1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 04fa367b20e9ca2e0a5d00a3dbaeee6131196f5eaf035fb35e7e6267facbe571
MD5 da105132276f3519884e195067985b5f
BLAKE2b-256 e40226fdfa2bb8ae4699062752f5bf0c1b0ead6732047deb2d5ec6dfd1c93cb0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page