Skip to main content

A rule based sentence segmentation library.

Project description

cutters

A rule based sentence segmentation library.
Python bindings for the cutters library written in Rust.

Release License Downloads

🚧 This library is experimental. 🚧

Features

  • Full UTF-8 support.
  • Robust parsing.
  • Language specific rules (each defined by its own PEG).
  • Fast and memory efficient parsing via the pest library.
  • Sentences can contain quotes which can contain subsentences.

Supported languages

  • Croatian (standard)
  • English (standard)

There is also an additional Baseline "language" that simply splits the text on sentence terminals as defined by UTF-8. Its intended use is for benchmarking.

Example

After installing the cutters package with pip, usage is simple (note that the language is defined via ISO 639-1 two letter language codes).

import cutters

text = """
Petar Krešimir IV. je vladao od 1058. do 1074. St. Louis 9LX je događaj u svijetu šaha. To je prof.dr.sc. Ivan Horvat. Volim rock, punk, funk, pop itd. Tolstoj je napisao: "Sve sretne obitelji nalik su jedna na drugu. Svaka nesretna obitelj nesretna je na svoj način."
""";

sentences = cutters.cut(text, "hr");

print(sentences);

This results in the following output (note that the str struct fields are &str).

[Sentence {
    str: "Petar Krešimir IV. je vladao od 1058. do 1074. ",
    quotes: [],
}, Sentence {
    str: "St. Louis 9LX je događaj u svijetu šaha.",
    quotes: [],
}, Sentence {
    str: "To je prof.dr.sc. Ivan Horvat.",
    quotes: [],
}, Sentence {
    str: "Volim rock, punk, funk, pop itd.",
    quotes: [],
}, Sentence {
    str: "Tolstoj je napisao: \"Sve sretne obitelji nalik su jedna na drugu. Svaka nesretna obitelj nesretna je na svoj način.\"",
    quotes: [
        Quote {
            str: "Sve sretne obitelji nalik su jedna na drugu. Svaka nesretna obitelj nesretna je na svoj način.",
            sentences: [
                "Sve sretne obitelji nalik su jedna na drugu.",
                "Svaka nesretna obitelj nesretna je na svoj način.",
            ],
        },
    ],
}]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cutters-0.1.4.tar.gz (6.1 kB view details)

Uploaded Source

Built Distributions

cutters-0.1.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (292.8 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

cutters-0.1.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (274.0 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

cutters-0.1.4-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (292.6 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

cutters-0.1.4-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (273.9 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

cutters-0.1.4-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (294.7 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

cutters-0.1.4-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (274.8 kB view details)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

cutters-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (292.4 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

cutters-0.1.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (273.6 kB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

cutters-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (292.3 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

cutters-0.1.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (273.7 kB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

cutters-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (292.3 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

cutters-0.1.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (273.7 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

cutters-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (292.7 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

cutters-0.1.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (274.0 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

cutters-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (292.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

cutters-0.1.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (273.9 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

cutters-0.1.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (292.5 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

cutters-0.1.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (273.9 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

File details

Details for the file cutters-0.1.4.tar.gz.

File metadata

  • Download URL: cutters-0.1.4.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.1.0

File hashes

Hashes for cutters-0.1.4.tar.gz
Algorithm Hash digest
SHA256 4c8172c8363413bf3b7138fb8a3198e81ccae966043f6a707769d4da454a72bc
MD5 16dc58f6e0cd6f3903ada35aa906bd23
BLAKE2b-256 e099104b7765859a314cb383a4f5211e3ec63d0faf6025e1fd63851015014454

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1bed35f8823161fb8afd536e523bb33685b45efb009286ababf1e472442e4f54
MD5 2cfde05fa0e32ffd136945c0e6ded2ea
BLAKE2b-256 426d6b57684336146831789e126650da6459e5429d441020b33c79d6ad23b55c

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c60ebceaa23317aedce23ea88b0193b9f58899e62eeff5e95a3d5166dfc1622b
MD5 2e98ed42793d2a484ded943fd3835acd
BLAKE2b-256 1ee072bc78485180310142e14b4207cea5525ed6e6fb590d2817865afc4ca3bd

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4c8443b0e1e302bda90dff217d3d8d6fb6c1ceac597ce11934410d3ddb2d2d09
MD5 7b8ba7d3cffb69d20cd6c88291d17d24
BLAKE2b-256 4271a4bfedf11a9841dcbf70cef162e80eae6196b623f88e3f25a781b3dfe672

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9a85fc83779316fcd86cf134cfe1547ec26a02a4aafca92cf8aaa3b9587d2ac9
MD5 74471865958a95231beb8f642dcba5e3
BLAKE2b-256 3595553e319ffe4fac569bf94e9467305d9bb234f10d8fccf0891ab50e2424f2

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 44f797ad8569ccb2ea1f1a068c1bb51f2cf62a7e0147634ad71ec911ec59edd5
MD5 cffc79d03ac9e1a074ff472f2906b9d0
BLAKE2b-256 02485ff909f56a64e74c6435bd36eacd22df5738ddde3d7a45f7b5e9598aa19e

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0b12ccf815a7ebc35acef20264c566a5895aae0ff6658a222a7b8d5fffcce226
MD5 1ea8225a562576adb21d50986e5bbcb2
BLAKE2b-256 08aa487fe0d836fb9feea531334c0d8fe1973dae2b9ae2c7f0b84ec8e25875c1

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 446e56edb2c18574f003c65ecf9dd81a9e301b7704354cec50d48ecdb02c2112
MD5 4e7321521a2ada18808a4ecef625c16d
BLAKE2b-256 3ff7b90c113f5690868b4094c630c9d501567b6f78398550c367497bce2e29f0

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9bb31c78114d62b70c833850587f81de7ce45b3353d39f95d41cd25ade9a60a4
MD5 f2df7c49391a9991d4c47554bb64e94e
BLAKE2b-256 682f4c1eda3ea782ddbaa37a24a18cf526993a62798b4e1a2fa019799bf770b5

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 50a279c5ec2f77bc2bc1e28bf5b1a0ec4935778a5603d60cabfdb3dfffe4c4c6
MD5 58b5311eaa35304e92bddf0be2baa088
BLAKE2b-256 dbd0b38f6ef9dec81fcacecbdf16a34f2468e05a4659212fbf99549d920e346c

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f0980b8a025236b422706bfafee1f5e9ba000673a595dba06ab162a4295c5c71
MD5 bba2d2086b1450fa4151c8f5378a2d83
BLAKE2b-256 80c031e4c1f47c7f7a5dd3066d1c33050399f70be6928fe460defd205a011fb0

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0a0b1de4f8b97cf72cbbb75e7fcb1bc0d93bec05229c5dd114aa4e5afc1bf676
MD5 0edfc1eea8148081560a78d50af29109
BLAKE2b-256 0785c2e5638dee4d5f6433ab72695c5c88193941ec1a5485523af120c87c7109

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0a4142ee1ee46f4dc53a68a7bbec01169ceab69f86aaf336c4a5e611beb24611
MD5 e45ea313c2dffe263cadc2aa40afe5e0
BLAKE2b-256 7a62e1123b8343f59ed13522542831012abf3abe911d997d4f4775afe72c6a94

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 763a90d7d36bfaad79de1e4ebdf4998184c4e331788753aa1d40395652c7e8c1
MD5 a691e705a76c18ba68d5e098ed281d5d
BLAKE2b-256 83167b35f5bd72bf108da4774bca60d36753816057cab5f6370a0c50bd5950aa

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 37339e192aa917641403fd5fd5ec0464e37ad7a1b808ef71c14afc94603947d8
MD5 c6f3138f639bf49e7b1fb82f08ffd969
BLAKE2b-256 c4461441a1de3a8e2eeea2c5d4944eca3f7929a823c36097cef8460933882b5b

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b9d597d673c296f9d6baa6c30c733596ac05170a933bb90c515995e0e51edefb
MD5 ee438402f10934527cbf91a8eb6888f9
BLAKE2b-256 20e3cf6999c9efd612a9b14ce154ab851ae28a8e2e88a7c4b0ca2b948b889b91

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 00cb138b4b17a59d96135ddf2fc0fd41db1809f3622fc3d1790b158aedf925f1
MD5 cb7656d32144869c75eedc8560f74594
BLAKE2b-256 7d247d8a89ecc795d00680e304b7ca2c8c47adee8583ea43eff87befb6deac29

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 90ce98e6d349dc6b045430750947bff3a23978561f5ec41fdf8f888d91c94a08
MD5 43a00b7c1567c59f62b271b8bbee8ada
BLAKE2b-256 b0ad2b57391a4b2bb8ac206990c4a3d3c8b0c89558dcdc4273fecf8efee77b1a

See more details on using hashes here.

File details

Details for the file cutters-0.1.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for cutters-0.1.4-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5d09e7165d06a78b70a438548fb2cacba650f26d9c6dcd11c845b46eaf72a3c8
MD5 c0726efaff5e623aee63896bd93c7971
BLAKE2b-256 9e1f3f269fea8e032735ece559c70d88119061b97572b63ea2c7e1b125d8aae4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page