Console CJK dictionary program.
cjk-defn is a command-line dictionary program for looking up definitions of terms and phrases from the CJK languages (Chinese, Japanese, and Korean). Unlike most other dictionaries, cjk-defn allows you to input entire lines of text, and shows relevant definitions for all terms and phrases found.
cjk-defn is implemented in Python 3 and released under the MIT License.
- Get definitions for longest matching terms
- Get definitions for each character
- Add as many dictionaries as you like
- Use as many dictionaries as you like
- Dictionary data in a SQLite database
- stdin-stdout I/O for flexibility
- Cross-platform console application
- Dictionary module can be imported
History / Why?
Around 2008-2009, I wanted a program like this, but none existed at that time. All tools with similar functionality were Web-based, or proprietary, or were limited to traditional dictionary lookups based on a single term. For someone who preferred console programs, there were no applications for this. Therefore, I had no choice but to make one myself.
After playing around with different formats, I wrote a program similar to this one, also using a SQLite back-end. However, it was limited to only a few pre-defined dictionaries. Because the application was ugly Python 2 and tied to specific dictionaries, the program stayed private for years.
Coming back to the project recently, I wanted to make it general enough to be useful to other people, and capable of using any number of dictionaries. After kicking around a few designs, I went with a very simple database with just two tables and a handful of fields. The new program has the flexibility to work with any number of dictionaries. Since it has been generalized, it may be useful to others who want such a dictionary program, and prefer the power and flexibility of Unix tools.
By default, the dictionary program includes no dictionaries or definitions. These are added by creating the SQLite database and inserting data into it. The DICTIONARIES table contains basic dictionary metadata. The DEFINITIONS table is for all dictionary definitions, and includes only a few standard fields.
For definitions, several free dictionaries are available, including those from JMdict/EDICT (Japanese-English), and CC-CEDICT (Chinese-English). Tools from the “edict-to-csv” software package may be helpful in converting dictionary data for imports.
To run this program, Python 3.x is required. Installation on a Unix-like platform is advised, but Windows is possible too. If you must use Windows, then Cygwin is the best environment.
To install the program, you can use the old:
# python3 setup.py install
Or you can use pip, which is the new and better way.
After installing the program, the database should be initialized. The directory for the database is located under the installation path in the following location:
In this directory, you can run the script “make-database” to create the database. Only two tables are in the database, and these are as follows:
DICTIONARIES: <= Each entry is a dictionary DI_DICT <= Dictionary ID (letters and numbers, no spaces) DI_SIGIL_S <= 2-char ID for standard definitions (e.g. =Z) DI_SIGIL_C <= 2-char ID for character definitions (e.g. -Z) DI_SHORT_DESCR <= Very short description of the dictionary DI_LONG_DESCR <= Long description, as much as you like DEFINITIONS: <= Each entry is a dictionary definition DF_DICT <= Dictionary ID DF_FORM1 <= Standard form of the term DF_FORM2 <= Other form of the term DF_ALT <= Transliteration DF_DEFN <= Full definition
By invoking the program with “-h” or “–help” flags, you can see usage information, and also see which dictionaries are available in the database:
$ cjk-defn -h
If we see that the dictionary “cedict” is available, then we can use it in the following way:
$ cjk-defn cedict
If we want to include character definitions, then we should add the suffix “/c” to the end of the dictionary:
$ cjk-defn cedict cedict/c
You can use any combination of dictionaries, in any order you like. If you want to view definitions for an entire text, you could do something like the following:
$ cat mytext.txt | cjk-defn cedict cedict/c | less
If you want to define a default set of dictionaries, you can set an environment variable, and then it is no longer necessary to specify the dictionaries when invoking cjk-defn:
$ CJK_DEFN_DICTS='cedict cedict/c' $ cat mytext.txt | cjk-defn | less
A normal manual page is included with the software, cjk-defn(1), which covers basic usage information.