Skip to main content

Cantonese Linguistics and NLP in Python

Project description

https://jacksonllee.com/logos/pycantonese-logo.png

Full Documentation: https://pycantonese.org


PyPI version Conda version

PyCantonese is a Python library for Cantonese linguistics and natural language processing (NLP). Currently implemented features:

  • Accessing and searching corpus data

  • Parsing and conversion tools for Jyutping romanization

  • Parsing Cantonese text

  • Stop words

  • Word segmentation

  • Part-of-speech tagging

The design of PyCantonese prioritizes ease of use and linguistic knowledge. It has been successfully used by both academic and commercial organizations, including major US tech companies.

Since v4.0.0 (March 2026), PyCantonese depends on Rustling, a library for efficient CHAT data handling, word segmentation, and part-of-speech tagging.

Download and Install

Using pip:

pip install --upgrade pycantonese

Using conda:

conda install -c conda-forge pycantonese

PyCantonese also works in JavaScript.

Ready for more? Check out Quickstart.

How to Cite

Lee, Jackson L., Litong Chen, Charles Lam, Chaak Ming Lau, and Tsz-Him Tsui. 2022. PyCantonese: Cantonese Linguistics and NLP in Python. Proceedings of the 13th Language Resources and Evaluation Conference.

@inproceedings{lee-etal-2022-pycantonese,
   title = "PyCantonese: Cantonese Linguistics and NLP in Python",
   author = "Lee, Jackson L.  and
      Chen, Litong  and
      Lam, Charles  and
      Lau, Chaak Ming  and
      Tsui, Tsz-Him",
   booktitle = "Proceedings of The 13th Language Resources and Evaluation Conference",
   month = jun,
   year = "2022",
   publisher = "European Language Resources Association",
}

License

MIT License.

Please note that PyCantonese includes data from the following sources, all of which are permissively licensed:

  • Hong Kong Cantonese Corpus (CC BY)

  • CantoMap (GPL-3.0)

  • rime-cantonese (CC BY 4.0)

  • Common Voice Cantonese (Mozilla Public License 2.0)

  • Cantonese-Traditional Chinese Parallel Corpus (CC0 1.0 Universal)

For details about these datasets, please see their documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycantonese-4.3.0.tar.gz (39.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pycantonese-4.3.0-cp310-abi3-win_amd64.whl (42.1 MB view details)

Uploaded CPython 3.10+Windows x86-64

pycantonese-4.3.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.7 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

pycantonese-4.3.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (42.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

pycantonese-4.3.0-cp310-abi3-macosx_11_0_arm64.whl (42.3 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

pycantonese-4.3.0-cp310-abi3-macosx_10_12_x86_64.whl (42.5 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file pycantonese-4.3.0.tar.gz.

File metadata

  • Download URL: pycantonese-4.3.0.tar.gz
  • Upload date:
  • Size: 39.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pycantonese-4.3.0.tar.gz
Algorithm Hash digest
SHA256 f3d01128ef90fb1dcd31b336cbf56f815e46f8bde6f047b1251f1925e4728946
MD5 588ae78d75c35e5ac553b2d35ad35c4c
BLAKE2b-256 23baf861f12b3cfdfdd3ad4c2e0845eb9deb8122ea088ded71b156e4ea90de0a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-4.3.0.tar.gz:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycantonese-4.3.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: pycantonese-4.3.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 42.1 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pycantonese-4.3.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 4c43514c8787d63c936c85b041271116f53139c4f74f0489ec733cb7875da21e
MD5 2af705e79a13655fb6ed4494ccc5d01d
BLAKE2b-256 4ecdca4d70ee2562581f88bea785128d42eba3c769fb035f5719a5679534b3b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-4.3.0-cp310-abi3-win_amd64.whl:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycantonese-4.3.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pycantonese-4.3.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2858854ff4dfe2d22af1f607c31ab8da7260be3e55091d2fba270b5db46e91c0
MD5 942005ed10a4689514b612cace292444
BLAKE2b-256 e42c4693397e5854a64726857325509fb5cb3e0033cc257cde044c519c309045

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-4.3.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycantonese-4.3.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pycantonese-4.3.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c8f97e14cde9e7bc50f4a52906692ba6c3ec3aef6d3bcea93b46b964078d0c9b
MD5 6ee16f6fbf2113e46ae5f19f985b8405
BLAKE2b-256 d2da5bcb927bebfab06a14a506cdf0dbd3f4575b7adaf5d10f21b27856db63b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-4.3.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycantonese-4.3.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pycantonese-4.3.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4d2d78b2555809017b11d53d3a37cc5a25e5b687a84be17106a520e178ddd455
MD5 6baee743a0457a33633586128e6ba186
BLAKE2b-256 736c84cc48221a13d01f0d3902ed515ddd646f1aa5feafaedb8ef5c387895b20

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-4.3.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycantonese-4.3.0-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pycantonese-4.3.0-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 61efabd3a9bdec9abd1625c08c2e8c2398c75fcf5c7b4afd7a1f8abe8538d825
MD5 fa89f310a5f978fc73fcbc8e4d7e76d7
BLAKE2b-256 c4d3435cd113615846f022226e19ab65d0e1fb8420e5582a6265b8e559ce8902

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-4.3.0-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page