Download and export UNIHAN to Python, CSV, JSON and YAML
Project description
unihan-tabular - tool to build UNIHAN into tabular-friendly formats like python, JSON, CSV and YAML. Part of the cihai project.
Unihan’s data is dispersed across multiple files in the format of:
U+3400 kCantonese jau1 U+3400 kDefinition (same as U+4E18 丘) hillock or mound U+3400 kMandarin qiū U+3401 kCantonese tim2 U+3401 kDefinition to lick; to taste, a mat, bamboo bark U+3401 kHanyuPinyin 10019.020:tiàn U+3401 kMandarin tiàn
unihan_tabular/process.py will download Unihan.zip and build all files into a single tabular friendly format.
CSV (default output: ./data/unihan.csv):
char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin 㐀,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū 㐁,U+3401,tim2,"to lick; to taste, a mat, bamboo bark",10019.020:tiàn,tiàn
JSON (default output: ./data/unihan.json):
[
{
"char": "㐀",
"ucn": "U+3400",
"kCantonese": "jau1",
"kDefinition": "(same as U+4E18 丘) hillock or mound",
"kHanyuPinyin": null,
"kMandarin": "qiū"
},
{
"char": "㐁",
"ucn": "U+3401",
"kCantonese": "tim2",
"kDefinition": "to lick; to taste, a mat, bamboo bark",
"kHanyuPinyin": "10019.020:tiàn",
"kMandarin": "tiàn"
}
]
YAML (default output: ./data/unihan.yaml):
- char: 㐀
kCantonese: jau1
kDefinition: (same as U+4E18 丘) hillock or mound
kHanyuPinyin: null
kMandarin: qiū
ucn: U+3400
- char: 㐁
kCantonese: tim2
kDefinition: to lick; to taste, a mat, bamboo bark
kHanyuPinyin: 10019.020:tiàn
kMandarin: tiàn
ucn: U+3401
process.py supports command line arguments. See unihan_tabular/process.py CLI arguments for information on how you can specify custom columns, files, download URL’s and output destinations.
Usage
To download and build your own unihan.csv:
$ pip install unihan-tabular
$ unihan-tabular
Creates data/unihan.json.
To output CSV:
$ unihan-tabular -F csv
To output YAML:
$ pip install pyyaml $ unihan-tabular -F yaml
To only output the kDefinition field in a csv:
$ unihan-tabular -F csv -f kDefinition
See unihan_tabular/process.py CLI arguments for advanced usage examples.
Structure
# output (JSON)
data/unihan.json
# output (CSV)
data/unihan.csv
# script to download + build a SDF csv of unihan.
unihan_tabular/process.py
# unit tests to verify behavior / consistency of builder
tests/*
# python 2/3 compatibility modules
unihan_tabular/_compat.py
unihan_tabular/unicodecsv.py
# utility / helper functions
unihan_tabular/util.py
data/unihan.csv - CSV export file.
unihan_tabular/process.py - create a data/unihan.csv.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.