Extract curated Chinese and English function words from texts.
Project description
functionwords
The functionwords
package provides curated Chinese and English function words.
It supports five function word lists, as listed below.
Chinese function words are only available in simplified form.
Function_words_list |
# of function words | Description |
---|---|---|
chinese_simplified_modern |
819 | compiled from the [dictionary][1] |
chinese_classical_naive |
32 | harvested from the [platforms][2] |
chinese_classical_comprehensive |
466 | compiled from the [dictionary][3] |
chinese_comprehensive |
1,122 | a combination of chinese_simplified_modern , chinese_classical_naive , and chinese_classical_comprehensive |
english |
512 | found in software |
The FunctionWords
class does the heavy lifting.
Initiate it with the desired function_words_list
.
The instance has two methods transform()
and get_feature_names()
) and
three attributes (function_words_list
, function_words
, and description
).
For more details, see FunctionWords instance's attribute description
.
Installation
pip install -U functionwords
Getting started
from functionwords import FunctionWords
raw = "The present King of Singapore is bald."
# to instantiate a FunctionWords instance
# `function_words_list` can be either 'chinese_classical_comprehensive',
# 'chinese_classical_naive', 'chinese_simplified_modern', or 'english'
fw = FunctionWords(function_words_list='english')
# to count function words accordingly
# returns a list of counts
fw.transform(raw)
# to list all function words given `function_words_list`
# returns a list
fw.get_feature_names()
Requirements
Only Python 3.8+ is required.
Important links
- Source code: https://github.com/Wang-Haining/functionwords
- Issue tracker: https://github.com/Wang-Haining/functionwords/issues
Version
- Created on March 17, 2021. v.0.5, launch.
- Modified on Nov. 19, 2021. v.0.6, fix bugs in extracting Chinese ngram features.
- Modified on Jan. 03, 2022. v.0.7, add
chinese_comprehensive
feature set. - Modified on Jan. 23, 2022. v.0.8, count Chinese ngram features finely.
Licence
This package is licensed under CC-BY-SA 4.0.
References
[1]: Ziqiang, W. (1998). Modern Chinese Dictionary of Function Words. Shanghai Dictionary Press.
[2]: https://baike.baidu.com/item/%E6%96%87%E8%A8%80%E8%99%9A%E8%AF%8D and https://zh.m.wikibooks.org/zh-hans/%E6%96%87%E8%A8%80/%E8%99%9B%E8%A9%9E
[3]: Hai, W., Changhai, Z., Shan, H., Keying, W. (1996). Classical Chinese Dictionary of Function Words. Peking University Press.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for functionwords-0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36f46fe46deee219b1bfcd563d48b5b18ab40f165e82cb78abd0c7ee56d24f4b |
|
MD5 | 3c01c81a6e9f6039b6c97572ce5be1b0 |
|
BLAKE2b-256 | 19d9a76180f1ecb7813126ef15627b3f0c33fb111a208e13ac50ed802d0d18c8 |