py-cc-dicts

A library for creating JSONs and accessing the data for the CC-Canto and CC-CEDICT open source Chinese dictionaries.

These details have not been verified by PyPI

Project links

Project description

A Python library to download, update, create and access keyed JSONs for the dictionaries CC-CEDICT and CC-Canto.

Modules

The core Python library consists of three files: parser.py, which handles parsing the raw text files sourced from the CC-CEDICT and CC-Canto websites and creating the JSONs; update.py, which handles fetching the data from those websites and calls functions from parser to generate the JSONS in the right place; and CC_Dict.py, which provides the the class CC_Dict for easier programmatic access of the paths for the JSONs or the data in the JSONs.

The two modules you'll most likely work with are update.py and CC_Dict.py.

CC_Dict.py

Core Class

from py_cc_dicts.CC_Dict import *

c = CC_Dict("CANTO") # Creates a CC_Dict object that can access the JSONs and dictionary data for CC-Canto. 
m = CC_Dict("CEDICT") # Creates a CC_Dict object that can access the JSONs and dictionary data for CC-CEDICT.
r = CC_Dict("READINGS") # Creates a CC_Dict object that can access the JSONs and readings data for the jyutping readings of CC-CEDICT as provided on the CC-Canto website.

# Loads the data from the dictionary website if not already existing into the current directory.

dicts = [CC_Dict("canto"), CC_Dict("cedict"), CC_Dict("readings")]
# Not case sensitive, the above works as well.

c.get_data(key = None) 
m.get_data(key = None)
# Get the dictionary data keyed with input *key* as a dict

c = CC_Dict("CANTO", data_dir = "some dir") # Creates a CC_Dict and stores the loaded data from the website at *data_dir* if it already does not exist in *data_dir*

c = CC_Dict("CANTO", update = True)
m = CC_Dict("CEDICT", data_dir = "some dir", update = True)
# Forcefully update the data by downloading it from the website and regenerating the JSONs, even if they already exists in either the current directory if none entred, or at *data_dir*

c2 = CC_Dict("CANTO", key = "traditional")
# By default load the dictionary data keyed by the input key into the CC_Dict's internal dict

c2.dict # Produces the dict keyed by traditional

# You can also search with dict syntax.
c2["出發"]
# Produces:
{'traditional': '出發', 'simplified': '出发', 'pinyin': 'chu1 fa1', 'jyutping': 'ceot1 faat3', 'definitions': ['to depart']}

c2["貓"]
# Produces (since there are multiple entries for the same key, they're provided as a list):
[{'traditional': '貓', 'simplified': '猫', 'pinyin': 'mao1', 'jyutping': 'maau1', 'definitions': ['cat M: 只zhī [只]', '(dialect) to hide oneself', '(coll.) modem', "to arch one's back", 'to be drunk', 'to be high on drugs']}, 
{'traditional': '貓', 'simplified': '猫', 'pinyin': 'mao1', 'jyutping': 'maau4', 'definitions': ['cat M: 只zhī [只]', '(dialect) to hide oneself', '(coll.) modem', "to arch one's back", 'to be drunk', 'to be high on drugs']}, 
{'traditional': '貓', 'simplified': '猫', 'pinyin': 'mao1', 'jyutping': 'miu4', 'definitions': ['cat M: 只zhī [只]', '(dialect) to hide oneself', '(coll.) modem', "to arch one's back", 'to be drunk', 'to be high on drugs']}]

c2.keys()
c2.values()
c2.items()
# As CC_Dict is an extension of dict, common dict functions also work, although some might have unintended behaviour if key = "definitions" (see below)

c3 = CC_Dict("CANTO", key = "definitions")
# If the key given is "definitions", allows for the search of all definitions via dict syntax.

c3["some string"]
# This would search and return all definitions for at contain the exact substring "some string" (as definitions are stored as strings)

update.py

Core Functions

from py_cc_dicts.update import *

load_latest_data() # Load to current working directory
load_latest_data("*insert path here*") # Load to provided path

# Load the raws, the plain txt files and the JSONS for both CC-CEDICT and CC-Canto to input directory, if provided, else to current working directory.

fetch_raw() 

# Loads the zip files from the CC-CEDICT and CC-CANTO website to the *current working directory*

generate_jsons("path to zip directory")

# Takes the path to the directory where the raw data is stored and outputs the parsed JSONs for each key type to the *current working directory*

get_jsons(dir = "", dict_type = "")
get_raws(dir = "", dict_type = "")

# Search dir for jsons or raw zip files of the input dict_type (CEDICT, CANTO), or both if no dict_type is provided, and returns a list of strings containing the paths to those files.

jsons_exists(dir = "")
raws_exists(dir = "")

# Check if the jsons or raw zip files exist in directory *dir*, or the current working directory if none provided.

clean_raws(dir = "")
clean_jsons(dir = "")

# Delete the raw zip files or JSONs from directory *dir*, or the current working directory if none provided.

parser.py

Constants

from py_cc_dicts.parser import *

DICT_TYPES = ["CEDICT", "CANTO"] # Valid Dictionary Codes, used throughout the program.
VALID_KEYS = {DICT_TYPES[0]: ["traditional", "simplified", "pinyin", "definitions", None],
               DICT_TYPES[1]: ["traditional", "simplified", "pinyin", "jyutping", "definitions", None]}  # Valid keys for CC_Dict, used for creation of JSONs

Core Functions

parse_cc_canto(filepath, key = "traditional")
parse_cc_cedict(filepath, key = "traditional", surnames = True)

# Parse the respective raw text file at *filepath* to produce a JSON with the given *key*. Surnames is currently unused.

Changelog

V 1.1

Can now access the jyupting readings data for CC-CEDICT as provided on the CC-Canto website.

r = CC_Dict("READINGS", key = "traditional")
r["試驗"]

# Returns:
{'traditional': '試驗', 'simplified': '试验', 'pinyin': 'shi4 yan4'}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Jun 4, 2026

1.0.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_cc_dicts-1.1.0.tar.gz (63.4 MB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

py_cc_dicts-1.1.0-py3-none-any.whl (12.6 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file py_cc_dicts-1.1.0.tar.gz.

File metadata

Download URL: py_cc_dicts-1.1.0.tar.gz
Upload date: Jun 4, 2026
Size: 63.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for py_cc_dicts-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`efd405203b91c85f674875f5b6898277307608e038d3a165ffd6b1b0554225d4`
MD5	`02482348bbaf13ff528d216a6cd56743`
BLAKE2b-256	`acf0fad10f2ee57fa4180a354daae3fc33e85ba79dd1a7e077d0f68f29e08af6`

See more details on using hashes here.

File details

Details for the file py_cc_dicts-1.1.0-py3-none-any.whl.

File metadata

Download URL: py_cc_dicts-1.1.0-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 12.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for py_cc_dicts-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dadb37ea6d31893c5ba48c875bfefa9e4c279394ae390ec1556fcc1187aa18ea`
MD5	`3383ab5e69c2c2f81e23165f260d6bb1`
BLAKE2b-256	`6cc026ab0fb1e6f3fba3cd6f380ade855bbe4c8ebbbc8306b982a1bd78a33a72`

See more details on using hashes here.

py-cc-dicts 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Modules

CC_Dict.py

Core Class

update.py

Core Functions

parser.py

Constants

Core Functions

Changelog

V 1.1

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes