Skip to main content

Convert the EDICT dictionary format into CSV.

Project description

edict-to-csv is a set of small command-line utilities for converting EDICT dictionaries into delimited text (CSV). As with many Unix commands, these programs simply read from the standard input and writes to the standard output. Two programs are provided:

  • cedict-to-csv(1)

  • edict1-to-csv(1)

edict1-to-csv converts dictionary entries from the original EDICT1 format used by the JMdict/EDICT project. It does not handle the EDICT2 format or subsequent XML-based formats.

cedict-to-csv converts dictionary entries from the CEDICT project, as used by CC-CEDICT. To use this program, you must have “pinyin-dec” software installed. This will reformat Pinyin entries to use proper diacritics.

CSV entries take the following format:

FORM1|FORM2|TRANSLITERATION|DEFINITION

In the case of EDICT, the second field is always empty. For CEDICT, the second field contains the simplified Chinese form.

The programs included are written as Unix-style command-line utilities. The program modules are also completely accessible through Python so all program functions can be easily called by other programs. The programs are written in Python 3 and are being made available under the MIT License.

Example Usage

You can convert the Japanese EDICT dictionary like this:

$ cat edict.utf8 | edict1-to-csv > edict.csv

If it is compressed and in EUC-JP encoding, you may have to convert it:

$ zcat edict.gz | iconv -f EUC-JP -t UTF-8 | edict1-to-csv > edict.csv

You can convert CC-CEDICT like this:

$ cat cedict.txt | cedict-to-csv > cedict.csv

If you try to use this program without pinyin-dec installed, you will see:

$ cat cedict.txt | cedict-to-csv > cedict.csv
cedict-to-csv: pinyin_dec not available!

Installation

You can install this software the old way with setup.py:

# python3 setup.py install

Or if you have pip installed, that is the better way.

Documentation

This software includes Unix manual pages, which are installed with the program files. By typing “man cedict-to-csv” or “man edict1-to-csv”, you can review the documentation for each program included here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edict-to-csv-1.0.0.tar.gz (5.6 kB view details)

Uploaded Source

File details

Details for the file edict-to-csv-1.0.0.tar.gz.

File metadata

File hashes

Hashes for edict-to-csv-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b9c42b06ab11d469dbd861cbd617f2d5f140623a90135e949fc991a974bb10db
MD5 2400e7df47a46f7b0dfea7f0201677de
BLAKE2b-256 070cfa205b003cbdec04932b5c4601c8542918e46a54c33eb001e38df78f2599

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page