Skip to main content

This is a sentence cutting tool that supports long sentence segmentation and short sentence merging.

Project description

sentence-spliter

Introduction

sentence-spliter splits a long text into a list of sentences. It supports natural segmentation, longest sentence segmentation, and shortest sentence merging.

Features

###Chinese spliter 1.natural spliter: according to the period, exclamation mark, question mark, semicolon, ellipsis. Do not split within double quotes and parentheses.

2.long sentence spliter:When the length of the long sentence exceeds the maximum length, it is preferentially divided according to punctuation marks, if the long sentence is still exceed maximum length after spliter, it is forced to be truncated.

3.short sentence combination:If the sentence is less than the minimum length, the sentences are combined.

###English

1.natural spliter: according to the period, exclamation mark, question mark, semicolon, ellipsis. Do not split within double quotes and parentheses.

TODO:

Optimize english spliter. For example,the period in english names is not divided.

INSTALLATION

1.pip

pip install sentence-spliter

2.git clone

git clone https://gitee.com/li_li_la/sentence-spliter.git

Usage

case 1:Use default parameters

from sentence_spliter import split
sentence = '锄禾日当午,汗滴禾下土.谁知盘中餐,粒粒皆辛苦.'
out = split(sentence)

# outputs
['锄禾日当午,汗滴禾下土.','谁知盘中餐,粒粒皆辛苦.']

case 2:Input your parameters

from sentence_spliter import Spliter
options = {'language': 'zh',  # 'zh' chinese, 'en' english
           'long_short_sent_handle': True,  # False splits naturally, does not process long and short sentences; True processes long and short sentences
            'max_length': 15,  # The longest sentence, the default value is 150
            'min_length': 4,  # The shortest sentence, default value 15
            'hard_max_length': 20,  # hard max_length
            'remove_blank': True  # Whether to remove space in the sentence}
spliter = Spliter(options)
paragraph = "“你真漂亮呢!哈哈哈”。“谢谢你啊”。今天很开心!"
cut_sentences =  spliter.cut_to_sentences(paragraph)
print(cut_sentences)

# outputs
['“你真漂亮呢!哈哈哈”。','“谢谢你啊”。','今天很开心!']

Options

options = {
  'language': 'zh',  			# 'zh'chinese 'en' english
  'long_short_sent_handle':True  # # False splits naturally, does not process long and short sentences; True processes long and short sentences
  'max_length': 150, 			#  The longest sentence, the default value is 150
  'min_length': 15,  			#   The shortest sentence, default value 15
  'hard_max_length': 300        #  hard-max
  'remove_blank' : True        #  Whether to remove space in the sentence(chinese)
}

Deployment

Docker 部署

pm2 部署(需要安装 npm install -g pm2)

pm2 start ./bin/spliter-service.sh

Web API

GET

POST

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentence_spliter-0.1.9.tar.gz (10.2 kB view hashes)

Uploaded Source

Built Distribution

sentence_spliter-0.1.9-py3-none-any.whl (7.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page