- support for Modified Hepburn, Kunreisiki, Nihonsiki systems
- custom overrides for individual mappings
- custom overrides for specific words
- built in exceptions list (Tokyo, Osaka, etc.)
- uses foreign spelling when available in UniDic
- proper nouns are capitalized
- slug mode for url generation
Things not supported:
- traditional Hepburn n-to-m: Shimbashi
- macrons or circumflexes: Tōkyō, Tôkyô
- passport Hepburn: Satoh (but you can use an exception)
- hyphenating words
- Traditional Hepburn in general is not supported
Internally, cutlet uses fugashi, so you can use the same dictionary you use for normal tokenization.
Cutlet can be installed through pip as usual.
pip install cutlet
Note that if you don't have a MeCab dictionary installed you'll also have to install one. If you're just getting started unidic-lite is a good choice.
pip install unidic-lite
A command-line script is included for quick testing. Just use
cutlet and each
line of stdin will be treated as a sentence. You can specify the system to use
nihon) as the first argument.
$ cutlet ローマ字変換プログラム作ってみた。 Roma ji henkan program tsukutte mita.
import cutlet katsu = cutlet.Cutlet() katsu.romaji("カツカレーは美味しい") # => 'Cutlet curry wa oishii' # you can print a slug suitable for urls katsu.slug("カツカレーは美味しい") # => 'cutlet-curry-wa-oishii' # You can disable using foreign spelling too katsu.use_foreign_spelling = False katsu.romaji("カツカレーは美味しい") # => 'Katsu karee wa oishii' # kunreisiki, nihonsiki work too katu = cutlet.Cutlet('kunrei') katu.romaji("富士山") # => 'Huzi yama' # comparison nkatu = cutlet.Cutlet('nihon') sent = "彼女は王への手紙を読み上げた。" katsu.romaji(sent) # => 'Kanojo wa ou e no tegami wo yomiageta.' katu.romaji(sent) # => 'Kanozyo wa ou e no tegami o yomiageta.' nkatu.romaji(sent) # => 'Kanozyo ha ou he no tegami wo yomiageta.'
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.