Skip to main content

Extract curated Chinese and English function words from texts.

Project description

functionwords

License: CC BY-NC-SA 4.0

The functionwords package provides curated Chinese and English function words. It supports five function word lists, as listed below. Chinese function words are only available in simplified form.

Function_words_list # of function words        Description        
chinese_simplified_modern 819 compiled from the [dictionary][1]
chinese_classical_naive 32 harvested from the [platforms][2]
chinese_classical_comprehensive 466 compiled from the [dictionary][3]
chinese_comprehensive 1,122 a combination of chinese_simplified_modern, chinese_classical_naive, and chinese_classical_comprehensive
english 512 found in software

The FunctionWords class does the heavy lifting. Initiate it with the desired function_words_list. The instance has two methods transform() and get_feature_names()) and three attributes (function_words_list, function_words, and description).

For more details, see FunctionWords instance's attribute description.

Installation

pip install -U functionwords

Getting started

from functionwords import FunctionWords

raw = "The present King of Singapore is bald."

# to instantiate a FunctionWords instance
# `function_words_list` can be either 'chinese_classical_comprehensive', 
# 'chinese_classical_naive', 'chinese_simplified_modern', or 'english'
fw = FunctionWords(function_words_list='english')

# to count function words accordingly
# returns a list of counts
fw.transform(raw)

# to list all function words given `function_words_list`
# returns a list
fw.get_feature_names()

Requirements

Only Python 3.8+ is required.

Important links

Version

  • Created on March 17, 2021. v.0.5, launch.
  • Modified on Nov. 19, 2021. v.0.6, fix bugs in extracting Chinese ngram features.
  • Modified on Jan. 03, 2022. v.0.7, add chinese_comprehensive feature set.
  • Modified on Jan. 23, 2022. v.0.8, count Chinese ngram features finely.

Licence

This package is licensed under CC-BY-SA 4.0.

References

[1]: Ziqiang, W. (1998). Modern Chinese Dictionary of Function Words. Shanghai Dictionary Press.

[2]: https://baike.baidu.com/item/%E6%96%87%E8%A8%80%E8%99%9A%E8%AF%8D and https://zh.m.wikibooks.org/zh-hans/%E6%96%87%E8%A8%80/%E8%99%9B%E8%A9%9E

[3]: Hai, W., Changhai, Z., Shan, H., Keying, W. (1996). Classical Chinese Dictionary of Function Words. Peking University Press.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

functionwords-0.8.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

functionwords-0.8-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file functionwords-0.8.tar.gz.

File metadata

  • Download URL: functionwords-0.8.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.8.5 Darwin/21.1.0

File hashes

Hashes for functionwords-0.8.tar.gz
Algorithm Hash digest
SHA256 0eb960b31a5e8731a4d100cd61bb4d884808566399363dbea70969eca2008265
MD5 4bc36d50fe2f598d8cdcfbb3e6a8fa59
BLAKE2b-256 3d5f34b22ec870889c3dc841588607e312cc42481177f66551427fec436bc5eb

See more details on using hashes here.

File details

Details for the file functionwords-0.8-py3-none-any.whl.

File metadata

  • Download URL: functionwords-0.8-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.8.5 Darwin/21.1.0

File hashes

Hashes for functionwords-0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 36f46fe46deee219b1bfcd563d48b5b18ab40f165e82cb78abd0c7ee56d24f4b
MD5 3c01c81a6e9f6039b6c97572ce5be1b0
BLAKE2b-256 19d9a76180f1ecb7813126ef15627b3f0c33fb111a208e13ac50ed802d0d18c8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page