Skip to main content

Export UNIHAN to Python, DataPackage, CSV, JSON and YAML

Project description

*unihan-tabular* - tool to build `UNIHAN`_ into tabular-friendly formats
like python, JSON, CSV and YAML. Part of the `cihai`_ project.

|pypi| |docs| |build-status| |coverage| |license|

Unihan's data is dispersed across multiple files in the format of::

U+3400 kCantonese jau1
U+3400 kDefinition (same as U+4E18 丘) hillock or mound
U+3400 kMandarin qiū
U+3401 kCantonese tim2
U+3401 kDefinition to lick; to taste, a mat, bamboo bark
U+3401 kHanyuPinyin 10019.020:tiàn
U+3401 kMandarin tiàn

``$ unihan-tabular`` will download Unihan.zip and build all files into a
single tabular friendly format.

CSV (default), ``$ unihan-tabular``::

char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin
㐀,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū
㐁,U+3401,tim2,"to lick; to taste, a mat, bamboo bark",10019.020:tiàn,tiàn

JSON, ``$ unihan-tabular -F json``:

.. code-block:: json

[
{
"char": "㐀",
"ucn": "U+3400",
"kCantonese": "jau1",
"kDefinition": "(same as U+4E18 丘) hillock or mound",
"kHanyuPinyin": null,
"kMandarin": "qiū"
},
{
"char": "㐁",
"ucn": "U+3401",
"kCantonese": "tim2",
"kDefinition": "to lick; to taste, a mat, bamboo bark",
"kHanyuPinyin": "10019.020:tiàn",
"kMandarin": "tiàn"
}
]

YAML ``$ unihan-tabular -F yaml``:

.. code-block:: yaml

- char: 㐀
kCantonese: jau1
kDefinition: (same as U+4E18 丘) hillock or mound
kHanyuPinyin: null
kMandarin: qiū
ucn: U+3400
- char: 㐁
kCantonese: tim2
kDefinition: to lick; to taste, a mat, bamboo bark
kHanyuPinyin: 10019.020:tiàn
kMandarin: tiàn
ucn: U+3401

``unihan-tabular`` supports command line arguments. See `unihan-tabular CLI
arguments`_ for information on how you can specify custom columns, files,
download URL's and output destinations.

.. _cihai: https://cihai.git-pull.com
.. _cihai-handbook: https://github.com/cihai/cihai-handbook
.. _cihai team: https://github.com/cihai?tab=members
.. _cihai-python: https://github.com/cihai/cihai-python
.. _unihan-tabular on github: https://github.com/cihai/unihan-tabular

Usage
-----

To download and build your own UNIHAN export:

.. code-block:: bash

$ pip install unihan-tabular

To output CSV, the default format:

.. code-block:: bash

$ unihan-tabular

To output JSON::

$ unihan-tabular -F json

To output YAML::

$ pip install pyyaml
$ unihan-tabular -F yaml

To only output the kDefinition field in a csv::

$ unihan-tabular -f kDefinition

To output multiple fields, separate with spaces::

$ unihan-tabular -f kCantonese kDefinition

To output to a custom file::

$ unihan-tabular --destination ./exported.csv

To output to a custom file (templated file extension)::

$ unihan-tabular --destination ./exported.{ext}

See `unihan-tabular CLI arguments`_ for advanced usage examples.

.. _unihan-tabular CLI arguments: http://unihan-tabular.readthedocs.org/en/latest/cli.html

Structure
---------

.. code-block:: bash

# output w/ JSON
{XDG data dir}/unihan_tabular/unihan.json

# output w/ CSV
{XDG data dir}/unihan_tabular/unihan.csv

# output w/ yaml (requires pyyaml)
{XDG data dir}/unihan_tabular/unihan.yaml

# script to download + build a SDF csv of unihan.
unihan_tabular/process.py

# unit tests to verify behavior / consistency of builder
tests/*

# python 2/3 compatibility module
unihan_tabular/_compat.py

# utility / helper functions
unihan_tabular/util.py

.. _MIT: http://opensource.org/licenses/MIT
.. _API: http://cihai.readthedocs.org/en/latest/api.html
.. _UNIHAN: http://www.unicode.org/charts/unihan.html

.. |pypi| image:: https://img.shields.io/pypi/v/unihan-tabular.svg
:alt: Python Package
:target: http://badge.fury.io/py/unihan-tabular

.. |build-status| image:: https://img.shields.io/travis/cihai/unihan-tabular.svg
:alt: Build Status
:target: https://travis-ci.org/cihai/unihan-tabular

.. |coverage| image:: https://codecov.io/gh/cihai/unihan-tabular/branch/master/graph/badge.svg
:alt: Code Coverage
:target: https://codecov.io/gh/cihai/unihan-tabular

.. |license| image:: https://img.shields.io/github/license/cihai/unihan-tabular.svg
:alt: License

.. |docs| image:: https://readthedocs.org/projects/unihan-tabular/badge/?version=latest
:alt: Documentation Status
:scale: 100%
:target: https://readthedocs.org/projects/unihan-tabular/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unihan-tabular-0.7.2.tar.gz (14.3 kB view details)

Uploaded Source

File details

Details for the file unihan-tabular-0.7.2.tar.gz.

File metadata

File hashes

Hashes for unihan-tabular-0.7.2.tar.gz
Algorithm Hash digest
SHA256 53451db64e5dbc381b69f2669c74db8b39c9346f1ae6c5777a110a8b44bdf88a
MD5 424b72df897d7a96dca12361f5f88e5a
BLAKE2b-256 701596f95a9ba55c60d15571fc5c87a712b00b421cda9da4d9c967dd4faaa4e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page