Skip to main content

character span label to tokenized base label for Japanese text

Project description

noyaki

Converts character span label information to tokenized text-based label information.

Installation

$ pip install noyaki

Usage

Pass the tokenized text and label information as arguments to the convert function.

import noyaki

label_list = noyaki.convert(
        ['明日', 'は', '田中', 'さん', 'に', '会う'],
        [[3, 5, 'PERSON']]
    )

print(label_list)
# ['O', 'O', 'U-PERSON', 'O', 'O', 'O'] 

If you want to remove the subword symbol (eg. ##), specify the subword argument.

import noyaki

label_list = noyaki.convert(
        ['明日', 'は', '田', '##中', 'さん', 'に', '会う'],
        [[3, 5, 'PERSON']],
	subword="##"
    )

print(label_list)
# ['O', 'O', 'B-PERSON', 'L-PERSON', 'O', 'O', 'O']

If you want to use IOB2 tag format, specify the scheme argument.

import noyaki

label_list = noyaki.convert(
        ['明日', 'は', '田', '##中', 'さん', 'に', '会う'],
        [[3, 5, 'PERSON']],
	scheme="IOB2"
    )

print(label_list)
# ['O', 'O', 'B-PERSON', 'I-PERSON', 'O', 'O', 'O']

Note

Only Japanese is supported.
supported tag formats are follow:

  • BILOU
  • IOB2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

noyaki-0.2.0.tar.gz (3.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

noyaki-0.2.0-py2.py3-none-any.whl (3.3 kB view details)

Uploaded Python 2Python 3

File details

Details for the file noyaki-0.2.0.tar.gz.

File metadata

  • Download URL: noyaki-0.2.0.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.6

File hashes

Hashes for noyaki-0.2.0.tar.gz
Algorithm Hash digest
SHA256 6162be127afd6b77741d0c9d9ba0e30a9e708ce2b1240201231082aed597cc7f
MD5 e42e5d428681b0325f84a24b571d58b1
BLAKE2b-256 af54ececbd1140dab823c58c5183f0b539daa13c4d07f95ad90674f999a6e1f6

See more details on using hashes here.

File details

Details for the file noyaki-0.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: noyaki-0.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.6

File hashes

Hashes for noyaki-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f0791462f39346d501687535fc01ee7ed8bde359b7978d31ea542a43f8550b11
MD5 ac23d282eff87e632e3070af984fa625
BLAKE2b-256 e05424873254f71f183a580c76927928e6b6ad41649ee1bc3ffaf197278d2fcd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page