Skip to main content

This is a sentence cutting tool that supports long sentence segmentation and short sentence merging.

Project description

sentence-spliter

Introduction

sentence-spliter splits a long text into a list of sentences. It supports natural segmentation, longest sentence segmentation, and shortest sentence merging.

Features

###Chinese spliter 1.natural spliter: according to the period, exclamation mark, question mark, semicolon, ellipsis. Do not split within double quotes and parentheses.

2.long sentence spliter:When the length of the long sentence exceeds the maximum length, it is preferentially divided according to punctuation marks, if the long sentence is still exceed maximum length after spliter, it is forced to be truncated.

3.short sentence combination:If the sentence is less than the minimum length, the sentences are combined.

###English

1.natural spliter: according to the period, exclamation mark, question mark, semicolon, ellipsis. Do not split within double quotes and parentheses.

TODO:

Optimize english spliter. For example,the period in english names is not divided.

INSTALLATION

1.pip

pip install sentence-spliter

2.git clone

git clone https://gitee.com/li_li_la/sentence-spliter.git

Usage

case 1:Use default parameters

from sentence_spliter import split
sentence = '锄禾日当午,汗滴禾下土.谁知盘中餐,粒粒皆辛苦.'
out = split(sentence)

# outputs
['锄禾日当午,汗滴禾下土.','谁知盘中餐,粒粒皆辛苦.']

case 2:Input your parameters

from sentence_spliter import Spliter
options = {'language': 'zh',  # 'zh' chinese, 'en' english
           'long_short_sent_handle': True,  # False splits naturally, does not process long and short sentences; True processes long and short sentences
            'max_length': 15,  # The longest sentence, the default value is 150
            'min_length': 4,  # The shortest sentence, default value 15
            'hard_max_length': 20,  # hard max_length
            'remove_blank': True  # Whether to remove space in the sentence}
spliter = Spliter(options)
paragraph = "“你真漂亮呢!哈哈哈”。“谢谢你啊”。今天很开心!"
cut_sentences =  spliter.cut_to_sentences(paragraph)
print(cut_sentences)

# outputs
['“你真漂亮呢!哈哈哈”。','“谢谢你啊”。','今天很开心!']

Options

options = {
  'language': 'zh',  			# 'zh'chinese 'en' english
  'long_short_sent_handle':True  # # False splits naturally, does not process long and short sentences; True processes long and short sentences
  'max_length': 150, 			#  The longest sentence, the default value is 150
  'min_length': 15,  			#   The shortest sentence, default value 15
  'hard_max_length': 300        #  hard-max
  'remove_blank' : True        #  Whether to remove space in the sentence(chinese)
}

Deployment

Docker 部署

pm2 部署(需要安装 npm install -g pm2)

pm2 start ./bin/spliter-service.sh

Web API

GET

POST

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentence_spliter-0.1.9.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sentence_spliter-0.1.9-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file sentence_spliter-0.1.9.tar.gz.

File metadata

  • Download URL: sentence_spliter-0.1.9.tar.gz
  • Upload date:
  • Size: 10.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for sentence_spliter-0.1.9.tar.gz
Algorithm Hash digest
SHA256 8b4aac5b11426069d3720553ed501fe66bfe9e7df71a66fd2e046ba099258b23
MD5 729d3cafaa3cccdf500ed26101bd9d9b
BLAKE2b-256 281881b73a507178ad1ea72a666a3bf361fe9866782169cf072712aa3aa157cb

See more details on using hashes here.

File details

Details for the file sentence_spliter-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: sentence_spliter-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1.post20200604 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for sentence_spliter-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 1cefb1c567e3cd41a34a2665b535a953332551f1b4c0a83f08831f55b9920459
MD5 976420d19d1bafc271be28df1e24e8c4
BLAKE2b-256 907eff87c66a5e7257964df0e9d6ef293b266933c3b5dffb2688bf9c4726b3af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page