Skip to main content

Tokenizer for kodoc

Project description

kodoc-tokenizer

Tokenizer for kodoc

Requirements

  • transformers>=4.0

Installation

pip3 install kodoc-tokenizer

How to Use

Version

import kodoc_tokenizer

kodoc_tokenizer.__version__  # 0.1.0rc1

clean_text

from kodoc_tokenizer import clean_text

text = "Today a::: : \t\t \x00I \x00a  朝 三暮四 [MASK] m \na fool \n\nbecause I am a fool. \n [SEP][CLS]  "
assert clean_text(text) == "Today a::: : I a 朝 三暮四 [MASK] m a fool because I am a fool. [SEP][CLS]"

Basic Function

from kodoc_tokenizer import KodocTokenizer

tokenizer = KodocTokenizer.from_pretrained("kodoc/kodoc-bert-base")
tokens = tokenizer.tokenize("다이어트마침표_1부 2013.7.25 02:24 PM 페이지1 제1부 다이어트 핵심 바이블 A`2`Z 다이어트에 실패하는 원인 중 하나는 잘못된 상식도 크게 한몫을 한다.")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kodoc-tokenizer-0.1.0rc1.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

kodoc_tokenizer-0.1.0rc1-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file kodoc-tokenizer-0.1.0rc1.tar.gz.

File metadata

  • Download URL: kodoc-tokenizer-0.1.0rc1.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.13

File hashes

Hashes for kodoc-tokenizer-0.1.0rc1.tar.gz
Algorithm Hash digest
SHA256 c4bd2f8dc8d904d96b477d07b9e33f867fe3c26a87ed59ee9b9bb4d954c50569
MD5 00fb14e4c2835086eb734980a64560ab
BLAKE2b-256 4f3cfe8b7900d8d5efffbe9f770f37ca4b1734755035085ccd0c444faa541c97

See more details on using hashes here.

File details

Details for the file kodoc_tokenizer-0.1.0rc1-py3-none-any.whl.

File metadata

  • Download URL: kodoc_tokenizer-0.1.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.6.13

File hashes

Hashes for kodoc_tokenizer-0.1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 5d62b0ef08bfba522a7dd84254f90dc45d44ef63f60254e1cac3427faa6e3db5
MD5 24a427d590588a1f6cd0ec455084721f
BLAKE2b-256 70b6a5fd086d5f7b0a6b0f8ce09d891b0b1a884b01601316cefec5bf7fbe60e9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page