Skip to main content

the Old Chinese language for spaCy

Project description

spacy-och

ci pypi

the Old Chinese (och) language for the spaCy NLP library.

installation

requires spacy v3.

$ pip install spacy-och

usage

this package currently doesn't include trained models and is intended for basic NLP usage only, via nlp.blank(). it tokenizes texts by character and supports the Token.like_num and Token.is_stop attributes.

>>> import spacy
>>> nlp = spacy.blank("och")
>>> from spacy_och.examples import sentences
>>> doc = nlp(sentences[0])
>>> doc.text
子曰:「上下无常非為邪也進退无恆非離群也君子進德脩業欲及時也故无咎。」
>>> [t for t in doc if t.is_stop] # all stop words
[, , , , , , , , , , , , , ]

more functionality is coming soon!

developing

after cloning the repository:

$ pip install -e ".[dev]"
$ pre-commit install

building

build a source archive and distribution for a release:

$ rm -rf dist/*
$ python -m build

publish the release on test PyPI (useful for making sure everything worked):

$ python -m twine upload --repository testpypi dist/*

if everything looks ok, upload to the real PyPI:

$ python -m twine upload dist/*

license

code is licensed under the MIT license. some lookups data is derived from files licensed under the unicode data files and software license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-och-0.1.2.tar.gz (358.5 kB view hashes)

Uploaded source

Built Distribution

spacy_och-0.1.2-py3-none-any.whl (372.2 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page