Fast and Efficient Sentence Tokenization
Project description
Fast Sentence Tokenizer (fast-sentence-tokenize)
Best in class tokenizer
Usage
Import
from fast_sentence_tokenize import fast_sentence_tokenize
Call Tokenizer
results = fast_sentence_tokenize("isn't a test great!!?")
Results
[
"isn't",
"a",
"test",
"great",
"!",
"!",
"?"
]
Note that whitespace is not preserved in the output by default.
This generally results in a more accurate parse from downstream components, but may make the reassembly of the original sentence more challenging.
Preserve Whitespace
results = fast_sentence_tokenize("isn't a test great!!?", eliminate_whitespace=False)
Results
[
"isn't ",
"a ",
"test ",
"great",
"!",
"!",
"?"
]
This option preserves whitespace.
This is useful if you want to re-assemble the tokens using the pre-existing spacing
assert ''.join(tokens) == input_text
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for fast_sentence_tokenize-0.1.15.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f5d8f5691f8dc41e321eac720ddaf1cb59fd33259e5482f78992e26162ac294 |
|
MD5 | c3ab532f89691946b53b66991e91a87b |
|
BLAKE2b-256 | 36591c68d48388ab9d7e6e77a2d6029d94317159bd1d6dadb6a533facd99cdf1 |
Close
Hashes for fast_sentence_tokenize-0.1.15-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85eed0ba762a6f919c7628b8c6951c5a09abf8f0544bfcf5add033c0e59e0b8d |
|
MD5 | 6f255453224b8296ff8dab0677c56b88 |
|
BLAKE2b-256 | 0ebc4f5de44e36700aff3303c1def32e7d155146cd9070d7f92ca8904e9983c2 |