Skip to main content

the Old Chinese language for spaCy

Project description


ci pypi

the Old Chinese (och) language for the spaCy NLP library.


requires spacy v3.

$ pip install spacy-och


this package currently doesn't include trained models and is intended for basic NLP usage only, via nlp.blank(). it tokenizes texts by character and supports the Token.like_num and Token.is_stop attributes.

>>> import spacy
>>> nlp = spacy.blank("och")
>>> from spacy_och.examples import sentences
>>> doc = nlp(sentences[0])
>>> doc.text
>>> [t for t in doc if t.is_stop] # all stop words
[, , , , , , , , , , , , , ]

more functionality is coming soon!


after cloning the repository:

$ pip install -e ".[dev]"
$ pre-commit install


build a source archive and distribution for a release:

$ rm -rf dist/*
$ python -m build

publish the release on test PyPI (useful for making sure everything worked):

$ python -m twine upload --repository testpypi dist/*

if everything looks ok, upload to the real PyPI:

$ python -m twine upload dist/*


code is licensed under the MIT license. some lookups data is derived from files licensed under the unicode data files and software license.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-och-0.1.2.tar.gz (358.5 kB view hashes)

Uploaded source

Built Distribution

spacy_och-0.1.2-py3-none-any.whl (372.2 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page