Skip to main content

Cantonese Linguistics and NLP in Python

Project description

https://jacksonllee.com/logos/pycantonese-logo.png

Full Documentation: https://pycantonese.org


PyPI version Conda version

PyCantonese is a Python library for Cantonese linguistics and natural language processing (NLP). Currently implemented features:

  • Accessing and searching corpus data

  • Parsing and conversion tools for Jyutping romanization

  • Parsing Cantonese text

  • Stop words

  • Word segmentation

  • Part-of-speech tagging

The design of PyCantonese prioritizes ease of use and linguistic knowledge. It has been successfully used by both academic and commercial organizations, including major US tech companies.

Since v4.0.0 (March 2026), PyCantonese depends on Rustling, a library for efficient CHAT data handling, word segmentation, and part-of-speech tagging.

Download and Install

Using pip:

pip install --upgrade pycantonese

Using conda:

conda install -c conda-forge pycantonese

PyCantonese also works in JavaScript.

Ready for more? Check out Quickstart.

How to Cite

Lee, Jackson L., Litong Chen, Charles Lam, Chaak Ming Lau, and Tsz-Him Tsui. 2022. PyCantonese: Cantonese Linguistics and NLP in Python. Proceedings of the 13th Language Resources and Evaluation Conference.

@inproceedings{lee-etal-2022-pycantonese,
   title = "PyCantonese: Cantonese Linguistics and NLP in Python",
   author = "Lee, Jackson L.  and
      Chen, Litong  and
      Lam, Charles  and
      Lau, Chaak Ming  and
      Tsui, Tsz-Him",
   booktitle = "Proceedings of The 13th Language Resources and Evaluation Conference",
   month = jun,
   year = "2022",
   publisher = "European Language Resources Association",
}

License

MIT License.

Please note that PyCantonese includes data from the following sources, all of which are permissively licensed:

  • Hong Kong Cantonese Corpus (CC BY)

  • CantoMap (GPL-3.0)

  • rime-cantonese (CC BY 4.0)

  • Common Voice Cantonese (Mozilla Public License 2.0)

  • Cantonese-Traditional Chinese Parallel Corpus (CC0 1.0 Universal)

For details about these datasets, please see their documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycantonese-5.0.0.tar.gz (39.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pycantonese-5.0.0-cp310-abi3-win_amd64.whl (42.1 MB view details)

Uploaded CPython 3.10+Windows x86-64

pycantonese-5.0.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.7 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

pycantonese-5.0.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (42.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

pycantonese-5.0.0-cp310-abi3-macosx_11_0_arm64.whl (42.3 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

pycantonese-5.0.0-cp310-abi3-macosx_10_12_x86_64.whl (42.5 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file pycantonese-5.0.0.tar.gz.

File metadata

  • Download URL: pycantonese-5.0.0.tar.gz
  • Upload date:
  • Size: 39.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pycantonese-5.0.0.tar.gz
Algorithm Hash digest
SHA256 8655ba3cd891b528d6b070245abed55b20a08d673f1056fcf9afad0adbc47c1d
MD5 2146b8f0b01b340ed9be12eb6275b187
BLAKE2b-256 090932e8000e39eb9ff034e1d55a86536a14abab90f2f4df176129fd90c43c29

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-5.0.0.tar.gz:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycantonese-5.0.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: pycantonese-5.0.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 42.1 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pycantonese-5.0.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8f7bdc1e9a37b5e164cd57f22aecae8532d4d0b74ff5b6f5e28480a657fa53f9
MD5 011e536bb9832b1f114eb67e62909244
BLAKE2b-256 f3fdc13baa1362c377552644f3558759fc4b87557a9d9e3f7a030b25b897bfe3

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-5.0.0-cp310-abi3-win_amd64.whl:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycantonese-5.0.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pycantonese-5.0.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4a010973683fb8d14313ef44cdf7a4ded1adbe1a1bc64c93d0f75e65f141dd1a
MD5 26f9b79f6618725f3da899f88ff6d0f6
BLAKE2b-256 eefeee179a7336935d71d1db34183008a5362457d657e59c7a4c63e20c0b7852

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-5.0.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycantonese-5.0.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pycantonese-5.0.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 609d14a588cb0b39fe1644347609677dbda77e25978dfd34f1f9de455f88637b
MD5 6cc80d7565f96d822408846cd8503576
BLAKE2b-256 5670702c2edf2bd1dac56aef6236e30d977b83efe313e4a6b1ef9fa23004341a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-5.0.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycantonese-5.0.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pycantonese-5.0.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b33d794ea43c8bb7e18699e561fa51a0f48dfb5a033d569b99b733b6ffc30917
MD5 117cececc5966340057d9448e02dd992
BLAKE2b-256 d8a9cbc19623761136d7c8f7230d633bc4735acf685f12da78662f4c24206bbf

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-5.0.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pycantonese-5.0.0-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for pycantonese-5.0.0-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 587387ffe61969d95a5ee5f3f47eb74712c50e76dbc00d578dfb59b45934e3f5
MD5 251976e81d7767b9faea4c95fa6d11f3
BLAKE2b-256 1e809b7163e41a460c7455026b7ff000ac8c75469ce8992a1f711bccf5759fbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for pycantonese-5.0.0-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on jacksonllee/pycantonese

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page