Skip to main content

Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder

Project description

Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.

This module allows splitting of text paragraphs into sentences. It is based on scripts developed by Philipp Koehn and Josh Schroeder for processing the Europarl corpus.

The module is a port of Lingua::Sentence Perl module with some extra additions (improved non-breaking prefix lists for some languages and added support for Danish, Finnish, Lithuanian, Norwegian (Bokmål), Romanian, and Turkish).

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentence_splitter-1.2.tar.gz (30.7 kB view details)

Uploaded Source

File details

Details for the file sentence_splitter-1.2.tar.gz.

File metadata

File hashes

Hashes for sentence_splitter-1.2.tar.gz
Algorithm Hash digest
SHA256 7078cf4f30057f912e03f2bad65e54c1cee0a81f29e744632904c62bd7585b81
MD5 b19fecf3575c05dfa2826b5dc3f71e29
BLAKE2b-256 6390b65253e9df530d575f707ef50247a106b2bf4970f154766d8177a7836da2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page