Unicode to ASCII transliteration
Project description
Any Ascii
Unicode to ASCII transliteration
Table of Contents
Examples
Script | Input | Output | Actual |
---|---|---|---|
résumé | resume | ||
© ㎧ Æ № | (C) m/s AE No | ||
Mandarin Chinese | 深圳 | ShenZhen | Shenzhen |
Cantonese Chinese | 深水埗 | ShenShuiBu | Sham Shui Po |
Russian Cyrillic | Борис Николаевич Ельцин | Boris Nikolaevich El'tsin | Boris Nikolayevich Yeltsin |
Korean Hangul | 반기문 | bangimun | Ban Ki-Moon |
Japanese Hiragana | さいたま | saitama | Saitama |
Japanese Kanji | 埼玉県 | QiYuXian | Saitama-ken |
Ancient Greek | Φειδιππίδης | Feidippidis | Pheidippides |
Modern Greek | Δημήτρης Φωτόπουλος | Dimitris Fotopoylos | Dimitris Fotopoulos |
Implementations
Java
String s = AnyAscii.transliterate("άνθρωποι");
// anthropoi
Java 6+ compatible
Available through JitPack
Maven
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<dependency>
<groupId>com.hunterwb</groupId>
<artifactId>any-ascii</artifactId>
<version>0.1.0</version>
</dependency>
Gradle
repositories {
maven { url 'https://jitpack.io' }
}
dependencies {
implementation 'com.hunterwb:any-ascii:0.1.0'
}
Python
from anyascii import anyascii
s = anyascii('άνθρωποι')
# anthropoi
Python 3.3+ compatible
Install latest release: pip install anyascii
Install from master: pip install https://github.com/hunterwb/any-ascii/archive/master.zip#subdirectory=python
Node.js
const anyAscii = require('any-ascii');
const s = anyAscii('άνθρωποι');
// anthropoi
Node.js 4+ compatible
Install latest release: npm install any-ascii
Install from master: npm install hunterwb/any-ascii
Glossary
- Unicode: The universal character set, a global standard to support all the world's languages. Consists of 100,000+ characters used by 150 writing systems. Typically encoded into bytes using UTF-8.
- ASCII:
The most compatible character set.
A subset of Unicode/UTF-8 consisting of 128 characters using 7-bits in the range
0x00
-0x7F
. The printable characters are English letters, digits, and punctuation in the range0x20
-0x7E
, with the remaining being control characters. - Transliteration: A mapping from one writing system into another, typically done one character at a time using predictable rules. Transliteration generally preserves the spelling of words, while translation preserves the meaning, and transcription preserves the sound. Transliteration into the Latin script used by English is known as romanization.
See Also
ALA-LC Romanization
BGN/PCGN Romanization
Compart: Unicode Charts
ICAO 9303: Machine Readable Passports
ISO 9: Cyrillic Romanization
Sean M. Burke: Unidecode
Sean M. Burke: Unidecode, Perl Journal
UNGEGN Romanization
Unicode CLDR: Transliteration Guidelines
Unicode Unihan Database
Wikipedia: Romanization of Greek
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.