A rule based sentence segmentation library.
Project description
cutters
A rule based sentence segmentation library.
Python bindings for the cutters library written in Rust.
🚧 This library is experimental. 🚧
Features
- Full UTF-8 support.
- Robust parsing.
- Language specific rules (each defined by its own PEG).
- Fast and memory efficient parsing via the pest library.
- Sentences can contain quotes which can contain subsentences.
Supported languages
- Croatian (standard)
- English (standard)
There is also an additional Baseline
"language" that simply splits the text on sentence terminals as defined by UTF-8. Its intended use is for benchmarking.
Example
After installing the cutters
package with pip
, usage is simple (note that the language is defined via ISO 639-1 two letter language codes).
import cutters
text = """
Petar Krešimir IV. je vladao od 1058. do 1074. St. Louis 9LX je događaj u svijetu šaha. To je prof.dr.sc. Ivan Horvat. Volim rock, punk, funk, pop itd. Tolstoj je napisao: "Sve sretne obitelji nalik su jedna na drugu. Svaka nesretna obitelj nesretna je na svoj način."
""";
sentences = cutters.cut(text, "hr");
print(sentences);
This results in the following output (note that the str
struct fields are &str
).
[Sentence {
str: "Petar Krešimir IV. je vladao od 1058. do 1074. ",
quotes: [],
}, Sentence {
str: "St. Louis 9LX je događaj u svijetu šaha.",
quotes: [],
}, Sentence {
str: "To je prof.dr.sc. Ivan Horvat.",
quotes: [],
}, Sentence {
str: "Volim rock, punk, funk, pop itd.",
quotes: [],
}, Sentence {
str: "Tolstoj je napisao: \"Sve sretne obitelji nalik su jedna na drugu. Svaka nesretna obitelj nesretna je na svoj način.\"",
quotes: [
Quote {
str: "Sve sretne obitelji nalik su jedna na drugu. Svaka nesretna obitelj nesretna je na svoj način.",
sentences: [
"Sve sretne obitelji nalik su jedna na drugu.",
"Svaka nesretna obitelj nesretna je na svoj način.",
],
},
],
}]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file cutters-0.1.4.tar.gz
.
File metadata
- Download URL: cutters-0.1.4.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c8172c8363413bf3b7138fb8a3198e81ccae966043f6a707769d4da454a72bc |
|
MD5 | 16dc58f6e0cd6f3903ada35aa906bd23 |
|
BLAKE2b-256 | e099104b7765859a314cb383a4f5211e3ec63d0faf6025e1fd63851015014454 |
File details
Details for the file cutters-0.1.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cutters-0.1.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 292.8 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1bed35f8823161fb8afd536e523bb33685b45efb009286ababf1e472442e4f54 |
|
MD5 | 2cfde05fa0e32ffd136945c0e6ded2ea |
|
BLAKE2b-256 | 426d6b57684336146831789e126650da6459e5429d441020b33c79d6ad23b55c |
File details
Details for the file cutters-0.1.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: cutters-0.1.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 274.0 kB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c60ebceaa23317aedce23ea88b0193b9f58899e62eeff5e95a3d5166dfc1622b |
|
MD5 | 2e98ed42793d2a484ded943fd3835acd |
|
BLAKE2b-256 | 1ee072bc78485180310142e14b4207cea5525ed6e6fb590d2817865afc4ca3bd |
File details
Details for the file cutters-0.1.4-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cutters-0.1.4-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 292.6 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c8443b0e1e302bda90dff217d3d8d6fb6c1ceac597ce11934410d3ddb2d2d09 |
|
MD5 | 7b8ba7d3cffb69d20cd6c88291d17d24 |
|
BLAKE2b-256 | 4271a4bfedf11a9841dcbf70cef162e80eae6196b623f88e3f25a781b3dfe672 |
File details
Details for the file cutters-0.1.4-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: cutters-0.1.4-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 273.9 kB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a85fc83779316fcd86cf134cfe1547ec26a02a4aafca92cf8aaa3b9587d2ac9 |
|
MD5 | 74471865958a95231beb8f642dcba5e3 |
|
BLAKE2b-256 | 3595553e319ffe4fac569bf94e9467305d9bb234f10d8fccf0891ab50e2424f2 |
File details
Details for the file cutters-0.1.4-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cutters-0.1.4-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 294.7 kB
- Tags: PyPy, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44f797ad8569ccb2ea1f1a068c1bb51f2cf62a7e0147634ad71ec911ec59edd5 |
|
MD5 | cffc79d03ac9e1a074ff472f2906b9d0 |
|
BLAKE2b-256 | 02485ff909f56a64e74c6435bd36eacd22df5738ddde3d7a45f7b5e9598aa19e |
File details
Details for the file cutters-0.1.4-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: cutters-0.1.4-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 274.8 kB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b12ccf815a7ebc35acef20264c566a5895aae0ff6658a222a7b8d5fffcce226 |
|
MD5 | 1ea8225a562576adb21d50986e5bbcb2 |
|
BLAKE2b-256 | 08aa487fe0d836fb9feea531334c0d8fe1973dae2b9ae2c7f0b84ec8e25875c1 |
File details
Details for the file cutters-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 292.4 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 446e56edb2c18574f003c65ecf9dd81a9e301b7704354cec50d48ecdb02c2112 |
|
MD5 | 4e7321521a2ada18808a4ecef625c16d |
|
BLAKE2b-256 | 3ff7b90c113f5690868b4094c630c9d501567b6f78398550c367497bce2e29f0 |
File details
Details for the file cutters-0.1.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 273.6 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9bb31c78114d62b70c833850587f81de7ce45b3353d39f95d41cd25ade9a60a4 |
|
MD5 | f2df7c49391a9991d4c47554bb64e94e |
|
BLAKE2b-256 | 682f4c1eda3ea782ddbaa37a24a18cf526993a62798b4e1a2fa019799bf770b5 |
File details
Details for the file cutters-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 292.3 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50a279c5ec2f77bc2bc1e28bf5b1a0ec4935778a5603d60cabfdb3dfffe4c4c6 |
|
MD5 | 58b5311eaa35304e92bddf0be2baa088 |
|
BLAKE2b-256 | dbd0b38f6ef9dec81fcacecbdf16a34f2468e05a4659212fbf99549d920e346c |
File details
Details for the file cutters-0.1.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 273.7 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0980b8a025236b422706bfafee1f5e9ba000673a595dba06ab162a4295c5c71 |
|
MD5 | bba2d2086b1450fa4151c8f5378a2d83 |
|
BLAKE2b-256 | 80c031e4c1f47c7f7a5dd3066d1c33050399f70be6928fe460defd205a011fb0 |
File details
Details for the file cutters-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 292.3 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a0b1de4f8b97cf72cbbb75e7fcb1bc0d93bec05229c5dd114aa4e5afc1bf676 |
|
MD5 | 0edfc1eea8148081560a78d50af29109 |
|
BLAKE2b-256 | 0785c2e5638dee4d5f6433ab72695c5c88193941ec1a5485523af120c87c7109 |
File details
Details for the file cutters-0.1.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 273.7 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a4142ee1ee46f4dc53a68a7bbec01169ceab69f86aaf336c4a5e611beb24611 |
|
MD5 | e45ea313c2dffe263cadc2aa40afe5e0 |
|
BLAKE2b-256 | 7a62e1123b8343f59ed13522542831012abf3abe911d997d4f4775afe72c6a94 |
File details
Details for the file cutters-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 292.7 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 763a90d7d36bfaad79de1e4ebdf4998184c4e331788753aa1d40395652c7e8c1 |
|
MD5 | a691e705a76c18ba68d5e098ed281d5d |
|
BLAKE2b-256 | 83167b35f5bd72bf108da4774bca60d36753816057cab5f6370a0c50bd5950aa |
File details
Details for the file cutters-0.1.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 274.0 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37339e192aa917641403fd5fd5ec0464e37ad7a1b808ef71c14afc94603947d8 |
|
MD5 | c6f3138f639bf49e7b1fb82f08ffd969 |
|
BLAKE2b-256 | c4461441a1de3a8e2eeea2c5d4944eca3f7929a823c36097cef8460933882b5b |
File details
Details for the file cutters-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 292.5 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9d597d673c296f9d6baa6c30c733596ac05170a933bb90c515995e0e51edefb |
|
MD5 | ee438402f10934527cbf91a8eb6888f9 |
|
BLAKE2b-256 | 20e3cf6999c9efd612a9b14ce154ab851ae28a8e2e88a7c4b0ca2b948b889b91 |
File details
Details for the file cutters-0.1.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 273.9 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00cb138b4b17a59d96135ddf2fc0fd41db1809f3622fc3d1790b158aedf925f1 |
|
MD5 | cb7656d32144869c75eedc8560f74594 |
|
BLAKE2b-256 | 7d247d8a89ecc795d00680e304b7ca2c8c47adee8583ea43eff87befb6deac29 |
File details
Details for the file cutters-0.1.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 292.5 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90ce98e6d349dc6b045430750947bff3a23978561f5ec41fdf8f888d91c94a08 |
|
MD5 | 43a00b7c1567c59f62b271b8bbee8ada |
|
BLAKE2b-256 | b0ad2b57391a4b2bb8ac206990c4a3d3c8b0c89558dcdc4273fecf8efee77b1a |
File details
Details for the file cutters-0.1.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: cutters-0.1.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 273.9 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d09e7165d06a78b70a438548fb2cacba650f26d9c6dcd11c845b46eaf72a3c8 |
|
MD5 | c0726efaff5e623aee63896bd93c7971 |
|
BLAKE2b-256 | 9e1f3f269fea8e032735ece559c70d88119061b97572b63ea2c7e1b125d8aae4 |