Skip to main content

Keyword Recognition and Pre-processing

Project description

In name entity recognition projects, we need to pre-process the character in sentences, which need to be converted into numeric features. In this package, we represent each character in two features: (1) vocabulary id, and (2) capitalization type. For example, the first record in the raw data is

(['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.'], ['ORG', 'O', 'MISC', 'O', 'O', 'O', 'MISC', 'O', 'O'])

the first list in this tuple is the original sentence, the second list in the tuple is the name entity for each character.

Using our package, the raw data bacome

([[1, 12], [2, 11], [3, 13], [4, 11], [5, 11], [6, 11], [7, 13], [8, 11], [9, 14]], [1, 4, 3, 4, 4, 4, 3, 4, 4])

in which the first list is the numeric features for the characters (id, capitalization type), the second list is the corresponding numeric labels for the name entity type.

At the same time, you can also retrieve the vocabulary dictionary built from your sentence data.

To do this, run the following code in terminal

python -m plnlp './plnlp/tiny.conll' './plnlp/tiny.conll'

The first and second arguments are the path of the train dataset and the development dataset, respectively.

The resulting datasets will be written and saved under your current working directory in a csv file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plnlp-1.0.1.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

plnlp-1.0.1-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file plnlp-1.0.1.tar.gz.

File metadata

  • Download URL: plnlp-1.0.1.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.25.0 CPython/3.6.8

File hashes

Hashes for plnlp-1.0.1.tar.gz
Algorithm Hash digest
SHA256 6409db3be330b0719ae8fdfdaaeae262f13e3d15f153da35d9de904f0d391b76
MD5 c90f76855d5534de0aa6cf7f7a5eaa8c
BLAKE2b-256 fe51285d27afbd060c20c6d696b98ba92dec63e8e4d485ad6af8c485b78faf3c

See more details on using hashes here.

File details

Details for the file plnlp-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: plnlp-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.25.0 CPython/3.6.8

File hashes

Hashes for plnlp-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a2f64ad997b33ae7e6b8808ca54e6dc05d20c986ea6d16ae2adf80e6c81281d1
MD5 e80b3bd0f7d56750ea78757c7d942d37
BLAKE2b-256 e17114f26958ceb7670ce86de8c58ac491b71cfa0fb9d5dcb9133bd1650741fd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page