Skip to main content

Tool to build UNIHAN dataset into datapackage / simple data format.

Project description

cihaidata-unihan - tool to build unihan into simple data format CSV format. Part of the cihai project.

Python Package Documentation Status Build Status Code Coverage License

Unihan’s data is disperved across multiple files in the format of:

U+3400      kCantonese      jau1
U+3400      kDefinition     (same as U+4E18 丘) hillock or mound
U+3400      kMandarin       qiū
U+3401      kCantonese      tim2
U+3401      kDefinition     to lick; to taste, a mat, bamboo bark
U+3401      kHanyuPinyin    10019.020:tiàn
U+3401      kMandarin       tiàn

cihaidata_unihan/process.py will download Unihan.zip and build all files into a single tabular CSV (default output: ./data/unihan.csv):

char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin
丘,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū
㐁,U+3401,tim2,"to lock; to taste, a mat, bamboo bark",10019.020:"tiàn,tiàn"

process.py supports command line arguments. See cihaidata_unihan/process.py CLI arguments for information on how you can specify custom columns, files, download URL’s and output destinations.

Being built against unit tests. See the Travis Builds and Revision History.

Usage

To download and build your own unihan.csv:

$ ./cihaidata_unihan/process.py

Creates data/unihan.csv.

See cihaidata_unihan/process.py CLI arguments for advanced usage examples.

Structure

# dataset metadata, schema information.
datapackage.json

# (future) when this package is stable, unihan.csv will be provided
data/unihan.csv

# stores downloaded Unihan.zip and it's txt file contents (.gitignore'd)
data/build_files/

# script to download + build a SDF csv of unihan.
cihaidata_unihan/process.py

# unit tests to verify behavior / consistency of builder
tests/*

# python 2/3 compatibility modules
cihaidata_unihan/_compat.py
cihaidata_unihan/unicodecsv.py

# python module, public-facing python API.
__init__.py
cihaidata_unihan/__init__.py

# utility / helper functions
cihaidata_unihan/util.py

Cihai is not required for:

  • data/unihan.csv - simple data format compatible csv file.
  • cihaidata_unihan/process.py - create a data/unihan.csv.

When this module is stable, data/unihan.csv will have prepared releases, without requires using cihaidata_unihan/process.py. process.py will not require external libraries.

Examples

Related links:

Python support Python 2.7, >= 3.3, pypy/pypy3
Source https://github.com/cihai/cihaidata-unihan
Docs https://cihaidata-unihan.git-pull.com
Changelog https://cihaidata-unihan.git-pull.com/en/latest/history.html
API https://cihaidata-unihan.git-pull.com/en/latest/api.html
Issues https://github.com/cihai/cihaidata-unihan/issues
Travis https://travis-ci.org/cihai/cihaidata-unihan
Test coverage https://codecov.io/gh/cihai/cihaidata-unihan
pypi https://pypi.python.org/pypi/cihaidata-unihan
OpenHub https://www.openhub.net/p/cihaidata-unihan
License MIT.
git repo
$ git clone https://github.com/cihai/cihaidata-unihan.git
install dev
$ git clone https://github.com/cihai/cihaidata-unihan.git cihai
$ cd ./cihai
$ virtualenv .env
$ source .env/bin/activate
$ pip install -e .
tests
$ python setup.py test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
cihaidata-unihan-0.4.2.tar.gz (11.7 kB) Copy SHA256 hash SHA256 Source None May 8, 2017

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page