Fast and Efficient Sentence Tokenization
Project description
Fast Sentence Tokenizer (fast-sentence-tokenize)
Best in class tokenizer
Usage
Import
from fast_sentence_tokenize import fast_sentence_tokenize
Call Tokenizer
results = fast_sentence_tokenize("isn't a test great!!?")
Results
[
"isn't",
"a",
"test",
"great",
"!",
"!",
"?"
]
Note that whitespace is not preserved in the output by default.
This generally results in a more accurate parse from downstream components, but may make the reassembly of the original sentence more challenging.
Preserve Whitespace
results = fast_sentence_tokenize("isn't a test great!!?", eliminate_whitespace=False)
Results
[
"isn't ",
"a ",
"test ",
"great",
"!",
"!",
"?"
]
This option preserves whitespace.
This is useful if you want to re-assemble the tokens using the pre-existing spacing
assert ''.join(tokens) == input_text
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for fast-sentence-tokenize-0.1.9.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82e5d8fce40844ab923acd62b50afcd54a39688257d85927c16e05e906be6571 |
|
MD5 | 692f6969b4f8086fecfa00db858bc491 |
|
BLAKE2b-256 | 6922c75d73b2d2d74270661f088c7b9bea1542147b23e3bf1740f6fce63cd0c7 |
Close
Hashes for fast_sentence_tokenize-0.1.9-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bd96aad35df972e48c6b1ea8b22e9972354d5a7f7359fb9256fe96ff3f8874a |
|
MD5 | 0deaf0b547ad0d1f37a3d377bd4a456c |
|
BLAKE2b-256 | ade895791f6fdea5f7595ddba3a20b1e5a60b624e22e39565a7e58f75a788631 |