An implementation of C-value and NC-value methods
Project description
ncnc
This is an implementation of C-value and NC-value methods proposed in the following paper:
- Automatic Recognition of Multi-Word Terms: the C-value/NC-value Method
Installation
$ pip install ncnc
Usage
C-value
First, prepare a DataFrame object which has the total frequency of each n-gram in a corpus. The names of the column and index should be f(a)
and ngram
, respectivey. The following code shows an example.
import pandas as pd
dict = {
"adenoid cystic basal cell carcinoma": 5
"cystic basal cell carcinoma": 11,
"ulcerated basal cell carcinoma": 7,
"recurrent basal cell carcinoma": 5,
"circumscribed basal cell carcinoma": 3,
"basal cell carcinoma": 984,
}
df = pd.DataFrame.from_dict(dict, orient="index", columns=["f(a)"]
df.index.name = "ngram"
Then, give the DataFrame object to calc_c_value()
.
from ncnc.c_value import calc_c_value
df = calc_c_value(df)
Now, you can see a C-value for each n-gram like this:
df = df.sort_values(by="c-value", ascending=False)
print(df.loc[:, ["f(a)", "c-value"]])
The results are as follows:
f(a) c-value
ngram
basal cell carcinoma 984 1551.361296
ulcerated basal cell carcinoma 7 14.000000
cystic basal cell carcinoma 11 12.000000
adenoid cystic basal cell carcinoma 5 11.609640
recurrent basal cell carcinoma 5 10.000000
circumscribed basal cell carcinoma 3 6.000000
NC-value
You can also calculate a NC-value for each n-gram like this:
from ncnc.nc_value import calc_nc_value
df = calc_nc_value(df)
df = df.sort_values(by="nc-value", ascending=False)
print(df.loc[:, ["f(a)", "c-value", "nc-value"]])
Note that the input of calc_nc_value()
is the output of calc_c_value()
. The NC-values can be calculated after calculating the C-values.
Also note that we use all part-of-speech elements as context words, whereas the original paper used only nouns, adjectives, and verbs.
The results are as follows:
f(a) c-value nc-value
ngram
basal cell carcinoma 984 1551.361296 1242.122370
ulcerated basal cell carcinoma 7 14.000000 11.200000
cystic basal cell carcinoma 11 12.000000 9.766667
adenoid cystic basal cell carcinoma 5 11.609640 9.287712
recurrent basal cell carcinoma 5 10.000000 8.000000
circumscribed basal cell carcinoma 3 6.000000 4.800000
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ncnc-1.0.0.tar.gz
.
File metadata
- Download URL: ncnc-1.0.0.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.11.0 Linux/5.15.0-1036-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30644b0c158d4fe51fc188305159d5c20f7168fb644312ac5c9c191066c89645 |
|
MD5 | 43989f1fe2b8cc7e54380392754330fc |
|
BLAKE2b-256 | fa546d56730172ec337c59ca140f7d68bab2884425940756630dcedec62b0f47 |
File details
Details for the file ncnc-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: ncnc-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.2 CPython/3.11.0 Linux/5.15.0-1036-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a2436318c6d0bd48eadf4dd539fe9a71d82eccfba8fc00578ef30f01fcee051 |
|
MD5 | f3189954107dbc246a1d827e5ef540bc |
|
BLAKE2b-256 | 16a08a218660db46c6a2aaf7349304689fbe7f4937a16294f3097f3d227e8e50 |