Fast and Efficient Sentence Tokenization
Project description
Fast Sentence Tokenizer (fast-sentence-tokenize)
Best in class tokenizer
Usage
Import
from fast_sentence_tokenize import fast_sentence_tokenize
Call Tokenizer
results = fast_sentence_tokenize("isn't a test great!!?")
Results
[
"isn't",
"a",
"test",
"great",
"!",
"!",
"?"
]
Note that whitespace is not preserved in the output by default.
This generally results in a more accurate parse from downstream components, but may make the reassembly of the original sentence more challenging.
Preserve Whitespace
results = fast_sentence_tokenize("isn't a test great!!?", eliminate_whitespace=False)
Results
[
"isn't ",
"a ",
"test ",
"great",
"!",
"!",
"?"
]
This option preserves whitespace.
This is useful if you want to re-assemble the tokens using the pre-existing spacing
assert ''.join(tokens) == input_text
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for fast-sentence-tokenize-0.1.13.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bdd32608a87b1f481416e69820a075497814676c9a1a332a58bb1c67c360aed |
|
MD5 | 8ee93305134b0987a3f5f242c254a6a2 |
|
BLAKE2b-256 | 8550ab14bc78b412c8c8f91db4a7281e3ba8fa2c7ec2e9e7da4f70b8db71513b |
Close
Hashes for fast_sentence_tokenize-0.1.13-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5e3d218b417aab77ee29375d9d6f82b54efc54c16212f9bc682b8f6689b6db7 |
|
MD5 | 60d0600eb7ba256ba55c990353cf1283 |
|
BLAKE2b-256 | deb5f5fdb1f9b9535b1102013e11e3dcb49cce60c10a64b5035e9b451921ec26 |