Fast tokenizer
Project description
Tokenizer
Tokenizer different language fast.
Build a package
python setup.py bdist_wheel
twine upload dist/*
Use Locally
x1 = '<a>刘强东是一个著名企业家。</a> 他创建了京东。'
t = EncoderLoader.load_tokenizer('bert-base-chinese-zh_v4-10K')
print(t.tokenize(x1, mode='char'))
print(t.tokenize(x1, mode='word'))
print(t.tokenize(x1, mode='all'))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
soco-tokenizer-1.0.tar.gz
(7.5 kB
view details)
File details
Details for the file soco-tokenizer-1.0.tar.gz
.
File metadata
- Download URL: soco-tokenizer-1.0.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/54.1.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82dfe7be617ff0b8d904b70a72f405b79d5981e6e12c6a88a1a4f84c7be75ba1 |
|
MD5 | 5ceb87786969b3ed24dd2ef7468b3c57 |
|
BLAKE2b-256 | 2b1ded9979c300e05794196e394a204ff5048e05e93d085cba268c4c457975f1 |