Skip to main content

Unicode to ASCII transliteration

Project description

Any Ascii

jitpack pypi npm

Unicode to ASCII transliteration

Table of Contents

Examples

Script Input Output Actual
résumé resume
© ㎧ Æ № (C) m/s AE No
Mandarin Chinese 深圳 ShenZhen Shenzhen
Cantonese Chinese 深水埗 ShenShuiBu Sham Shui Po
Russian Cyrillic Борис Николаевич Ельцин Boris Nikolaevich El'tsin Boris Nikolayevich Yeltsin
Korean Hangul 반기문 bangimun Ban Ki-Moon
Japanese Hiragana さいたま saitama Saitama
Japanese Kanji 埼玉県 QiYuXian Saitama-ken
Ancient Greek Φειδιππίδης Feidippidis Pheidippides
Modern Greek Δημήτρης Φωτόπουλος Dimitris Fotopoylos Dimitris Fotopoulos

Implementations

Java

String s = AnyAscii.transliterate("άνθρωποι");
// anthropoi

Java 6+ compatible

Available through JitPack

Maven
<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>
<dependency>
    <groupId>com.hunterwb</groupId>
    <artifactId>any-ascii</artifactId>
    <version>0.1.0</version>
</dependency>
Gradle
repositories {
    maven { url 'https://jitpack.io' }
}
dependencies {
    implementation 'com.hunterwb:any-ascii:0.1.0'
}

Python

from anyascii import anyascii

s = anyascii('άνθρωποι')
#  anthropoi

Python 3.3+ compatible

Install latest release: pip install anyascii

Install from master: pip install https://github.com/hunterwb/any-ascii/archive/master.zip#subdirectory=python

Node.js

const anyAscii = require('any-ascii');

const s = anyAscii('άνθρωποι');
// anthropoi

Node.js 4+ compatible

Install latest release: npm install any-ascii

Install from master: npm install hunterwb/any-ascii

Glossary

  • Unicode: The universal character set, a global standard to support all the world's languages. Consists of 100,000+ characters used by 150 writing systems. Typically encoded into bytes using UTF-8.
  • ASCII: The most compatible character set. A subset of Unicode/UTF-8 consisting of 128 characters using 7-bits in the range 0x00 - 0x7F. The printable characters are English letters, digits, and punctuation in the range 0x20 - 0x7E, with the remaining being control characters.
  • Transliteration: A mapping from one writing system into another, typically done one character at a time using predictable rules. Transliteration generally preserves the spelling of words, while translation preserves the meaning, and transcription preserves the sound. Transliteration into the Latin script used by English is known as romanization.

See Also

ALA-LC Romanization
BGN/PCGN Romanization
Compart: Unicode Charts
ICAO 9303: Machine Readable Passports
ISO 9: Cyrillic Romanization
Sean M. Burke: Unidecode
Sean M. Burke: Unidecode, Perl Journal
UNGEGN Romanization
Unicode CLDR: Transliteration Guidelines
Unicode Unihan Database
Wikipedia: Romanization of Greek

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anyascii-0.1.1.tar.gz (156.3 kB view hashes)

Uploaded Source

Built Distribution

anyascii-0.1.1-py3-none-any.whl (236.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page