Skip to main content

An implementation of C-value and NC-value methods

Project description

ncnc

test

This is an implementation of C-value and NC-value methods proposed in the following paper:

  • Automatic Recognition of Multi-Word Terms: the C-value/NC-value Method

Installation

$ pip install ncnc

Usage

C-value

First, prepare a DataFrame object which has the total frequency of each n-gram in a corpus. The names of the column and index should be f(a) and ngram, respectivey. The following code shows an example.

import pandas as pd


dict = {
    "adenoid cystic basal cell carcinoma": 5
    "cystic basal cell carcinoma": 11,
    "ulcerated basal cell carcinoma": 7,
    "recurrent basal cell carcinoma": 5,
    "circumscribed basal cell carcinoma": 3,
    "basal cell carcinoma": 984,
}
df = pd.DataFrame.from_dict(dict, orient="index", columns=["f(a)"]
df.index.name = "ngram"

Then, give the DataFrame object to calc_c_value().

from ncnc.c_value import calc_c_value


df = calc_c_value(df)

Now, you can see a C-value for each n-gram like this:

df = df.sort_values(by="c-value", ascending=False)
print(df.loc[:, ["f(a)", "c-value"]])

The results are as follows:

                                     f(a)      c-value
ngram
basal cell carcinoma                  984  1551.361296
ulcerated basal cell carcinoma          7    14.000000
cystic basal cell carcinoma            11    12.000000
adenoid cystic basal cell carcinoma     5    11.609640
recurrent basal cell carcinoma          5    10.000000
circumscribed basal cell carcinoma      3     6.000000

NC-value

You can also calculate a NC-value for each n-gram like this:

from ncnc.nc_value import calc_nc_value


df = calc_nc_value(df)
df = df.sort_values(by="nc-value", ascending=False)
print(df.loc[:, ["f(a)", "c-value", "nc-value"]])

Note that the input of calc_nc_value() is the output of calc_c_value(). The NC-values can be calculated after calculating the C-values.

Also note that we use all part-of-speech elements as context words, whereas the original paper used only nouns, adjectives, and verbs.

The results are as follows:

                                     f(a)      c-value     nc-value
ngram
basal cell carcinoma                  984  1551.361296  1242.122370
ulcerated basal cell carcinoma          7    14.000000    11.200000
cystic basal cell carcinoma            11    12.000000     9.766667
adenoid cystic basal cell carcinoma     5    11.609640     9.287712
recurrent basal cell carcinoma          5    10.000000     8.000000
circumscribed basal cell carcinoma      3     6.000000     4.800000

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncnc-1.0.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

ncnc-1.0.0-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file ncnc-1.0.0.tar.gz.

File metadata

  • Download URL: ncnc-1.0.0.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.0 Linux/5.15.0-1036-azure

File hashes

Hashes for ncnc-1.0.0.tar.gz
Algorithm Hash digest
SHA256 30644b0c158d4fe51fc188305159d5c20f7168fb644312ac5c9c191066c89645
MD5 43989f1fe2b8cc7e54380392754330fc
BLAKE2b-256 fa546d56730172ec337c59ca140f7d68bab2884425940756630dcedec62b0f47

See more details on using hashes here.

File details

Details for the file ncnc-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ncnc-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.2 CPython/3.11.0 Linux/5.15.0-1036-azure

File hashes

Hashes for ncnc-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6a2436318c6d0bd48eadf4dd539fe9a71d82eccfba8fc00578ef30f01fcee051
MD5 f3189954107dbc246a1d827e5ef540bc
BLAKE2b-256 16a08a218660db46c6a2aaf7349304689fbe7f4937a16294f3097f3d227e8e50

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page