Library for CJK (chinese, japanese, korean) language data.
Project description
cihai - Python library for CJK (chinese, japanese, korean) data
This project is under active development. Follow our progress and check back for updates!
Usage
API / Library (this repository)
$ pip install --user cihai
from cihai.core import Cihai
c = Cihai()
if not c.unihan.is_bootstrapped: # download and install Unihan to db
c.unihan.bootstrap(unihan_options)
query = c.unihan.lookup_char('好')
glyph = query.first()
print("lookup for 好: %s" % glyph.kDefinition)
# lookup for 好: good, excellent, fine; well
query = c.unihan.reverse_char('good')
print('matches for "good": %s ' % ', '.join([glph.char for glph in query]))
# matches for "good": 㑘, 㑤, 㓛, 㘬, 㙉, 㚃, 㚒, 㚥, 㛦, 㜴, 㜺, 㝖, 㤛, 㦝, ...
See API documentation and /examples.
CLI (cihai-cli)
$ pip install --user cihai[cli]
# character lookup
$ cihai info 好
char: 好
kCantonese: hou2 hou3
kDefinition: good, excellent, fine; well
kHangul: 호
kJapaneseOn: KOU
kKorean: HO
kMandarin: hǎo
kTang: '*xɑ̀u *xɑ̌u'
kTotalStrokes: '6'
kVietnamese: háo
ucn: U+597D
# reverse lookup
$ cihai reverse library
char: 圕
kCangjie: WLGA
kCantonese: syu1
kCihaiT: '308.302'
kDefinition: library
kMandarin: tú
kTotalStrokes: '13'
ucn: U+5715
--------
UNIHAN data
All datasets that cihai uses have stand-alone tools to export their data. No library required.
- unihan-etl - UNIHAN data exports for csv, yaml and json.
Developing
poetry is a required package to develop.
git clone https://github.com/cihai/cihai.git
cd cihai
poetry install -E "docs test coverage lint format"
Makefile commands prefixed with watch_
will watch files and rerun.
Tests
poetry run py.test
Helpers: make test
Rerun tests on file change: make watch_test
(requires entr(1))
Documentation
Default preview server: http://localhost:8035
cd docs/
and make html
to build. make serve
to start http server.
Helpers: make build_docs
, make serve_docs
Rebuild docs on file change: make watch_docs
(requires
entr(1))
Rebuild docs and run server via one terminal: make dev_docs
(requires
above, and a make(1)
with -J
support, e.g. GNU Make)
Formatting / Linting
The project uses black and isort (one after the other) and runs flake8 via CI. See the configuration in pyproject.toml and `setup.cfg`:
make black isort
: Run black
first, then isort
to handle import
nuances make flake8
, to watch (requires entr(1)
):
make watch_flake8
Releasing
As of 0.10, poetry handles virtualenv creation, package requirements, versioning, building, and publishing. Therefore there is no setup.py or requirements files.
Update __version__ in __about__.py and `pyproject.toml`:
git commit -m 'build(cihai): Tag v0.1.1'
git tag v0.1.1
git push
git push --tags
poetry build
poetry deploy
Quick links
- Usage
- Datasets a full list of current and future data sets
- Python API
- Roadmap
- Python support: >= 3.6, pypy
- Source: https://github.com/cihai/cihai
- Docs: https://cihai.git-pull.com
- Changelog: https://cihai.git-pull.com/history.html
- API: https://cihai.git-pull.com/api.html
- Issues: https://github.com/cihai/cihai/issues
- Test coverage: https://codecov.io/gh/cihai/cihai
- pypi: https://pypi.python.org/pypi/cihai
- OpenHub: https://www.openhub.net/p/cihai
- License: MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.