the Old Chinese language for spaCy
Project description
spacy-och
the Old Chinese (och
) language for the spaCy NLP library.
installation
requires spacy v3.
$ pip install spacy-och
usage
this package currently doesn't include trained models and is intended for basic NLP usage only, via nlp.blank()
. it tokenizes texts by character and supports the Token.like_num
and Token.is_stop
attributes.
>>> import spacy
>>> nlp = spacy.blank("och")
>>> from spacy_och.examples import sentences
>>> doc = nlp(sentences[0])
>>> doc.text
子曰:「上下无常,非為邪也。進退无恆,非離群也。君子進德脩業、欲及時也,故无咎。」
>>> [t for t in doc if t.is_stop] # all stop words
[曰, :, 非, 也, 。, 非, 也, 。, 、, 欲, 及, 也, 故, 。]
more functionality is coming soon!
developing
after cloning the repository:
$ pip install -e ".[dev]"
$ pre-commit install
building
build a source archive and distribution for a release:
$ rm -rf dist/*
$ python -m build
publish the release on test PyPI (useful for making sure everything worked):
$ python -m twine upload --repository testpypi dist/*
if everything looks ok, upload to the real PyPI:
$ python -m twine upload dist/*
license
code is licensed under the MIT license. some lookups data is derived from files licensed under the unicode data files and software license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spacy_och-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba114f0e35f6b88a3cf77a3f5167fd05b300a9d29cfb275dcdad033c6b2e3291 |
|
MD5 | 57ce556e11f8d99e49de3b27d366eee8 |
|
BLAKE2b-256 | 5c588da555ff8d821b5855fc4401b0a972969d5b6d47d9d6ca46331f76459f4b |