A series of Chinese processing tools
Project description
# A series of Chinese processing tools
There are some features in the package. If you need to use it can be installed directly
extract_chinese(self, text, symbols= “”, remove_letter=False)
Extract chinese from raw text. You can choose symbols to replace non-Chinese characters in the corpus
filter_words_in_corups(self, sensitive_list, raw_text_list)
Use AC automata to filter sensitive words
is_chinese(self, uchar)
Determine if a character is Chinese
get_json_data(self, path)
Read json data from a file, where each line is a Dict
save_data_to_npy(self, data, path)
read_data_from_npy(self, path)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
chptools-0.0.1.tar.gz
(2.4 kB
view hashes)