Fast and Efficient Sentence Segmentation

These details have not been verified by PyPI

Project links

Project description

Fast Sentence Segmentation

Fast and efficient sentence segmentation using spaCy. Handles complex edge cases like abbreviations (Dr., Mr., etc.), quoted text, and multi-paragraph documents.

Features

Paragraph-aware segmentation: Returns sentences grouped by paragraph
Abbreviation handling: Correctly handles "Dr.", "Mr.", "etc." without false splits
Cached processing: LRU cache for repeated text processing
Flexible output: Nested lists (by paragraph) or flattened list of sentences
Bullet point & numbered list normalization: Cleans common list formats

Installation

pip install fast-sentence-segment

After installation, download the spaCy model:

python -m spacy download en_core_web_sm

Quick Start

from fast_sentence_segment import segment_text

text = "Here is a Dr. who says something. And then again, what else? I don't know. Do you?"

results = segment_text(text)
# Returns: [['Here is a Dr. who says something.', 'And then again, what else?', "I don't know.", 'Do you?']]

Usage

Basic Segmentation

The segment_text function returns a list of lists, where each inner list represents a paragraph containing its sentences:

from fast_sentence_segment import segment_text

text = """First paragraph here. It has two sentences.

Second paragraph starts here. This one also has multiple sentences. And a third."""

results = segment_text(text)
# Returns:
# [
#     ['First paragraph here.', 'It has two sentences.'],
#     ['Second paragraph starts here.', 'This one also has multiple sentences.', 'And a third.']
# ]

Flattened Output

If you don't need paragraph boundaries, use the flatten parameter:

results = segment_text(text, flatten=True)
# Returns: ['First paragraph here.', 'It has two sentences.', 'Second paragraph starts here.', ...]

Direct Segmenter Access

For more control, use the Segmenter class directly:

from fast_sentence_segment import Segmenter

segmenter = Segmenter()
results = segmenter.input_text("Your text here.")

API Reference

Function	Parameters	Returns	Description
`segment_text()`	`input_text: str`, `flatten: bool = False`	`list`	Main entry point for segmentation
`Segmenter.input_text()`	`input_text: str`	`list[list[str]]`	Cached paragraph-aware segmentation

Why Nested Lists?

The segmentation process preserves document structure by segmenting into both paragraphs and sentences. Each outer list represents a paragraph, and each inner list contains that paragraph's sentences. This is useful for:

Document structure analysis
Paragraph-level processing
Maintaining original text organization

Use flatten=True when you only need sentences without paragraph context.

Requirements

Python 3.8.5+
spaCy 3.5.3
en_core_web_sm spaCy model

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Run tests (make test)
Commit your changes
Push to the branch
Open a Pull Request

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.8.3

Mar 24, 2026

1.8.2

Feb 26, 2026

1.8.1

Feb 26, 2026

1.7.5

Feb 14, 2026

1.7.3

Feb 11, 2026

1.7.2

Feb 11, 2026

1.7.0

Feb 5, 2026

1.6.2

Feb 4, 2026

1.6.1

Feb 4, 2026

1.6.0

Feb 4, 2026

1.5.3

Feb 4, 2026

1.4.5

Feb 3, 2026

1.4.4

Feb 3, 2026

1.4.3

Feb 3, 2026

1.4.2

Feb 3, 2026

1.4.1

Feb 3, 2026

1.4.0

Feb 3, 2026

1.3.0

Jan 29, 2026

1.2.1

Dec 29, 2025

1.2.0

Dec 29, 2025

This version

1.1.8

Dec 28, 2025

0.1.9

Mar 9, 2023

0.1.8

Mar 9, 2023

0.1.7

Mar 3, 2023

0.1.6

Nov 19, 2022

0.1.5

Oct 20, 2022

0.1.4

Oct 20, 2022

0.1.2

Aug 25, 2022

0.1.1

Aug 20, 2022

0.1.0

Aug 20, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_sentence_segment-1.1.8.tar.gz (9.3 kB view details)

Uploaded Dec 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fast_sentence_segment-1.1.8-py3-none-any.whl (13.6 kB view details)

Uploaded Dec 28, 2025 Python 3

File details

Details for the file fast_sentence_segment-1.1.8.tar.gz.

File metadata

Download URL: fast_sentence_segment-1.1.8.tar.gz
Upload date: Dec 28, 2025
Size: 9.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for fast_sentence_segment-1.1.8.tar.gz
Algorithm	Hash digest
SHA256	`6991ef7fca8cb9d40c6139c4926f9d7500acd0e288f0b23468a588d9d7aa46fd`
MD5	`ed073ef0dea58714a0c165e195ae5579`
BLAKE2b-256	`856fd8e0e98a0aa91e18a84c6aea4fa85c855620863b2a89c1bc8c84f61080c1`

See more details on using hashes here.

File details

Details for the file fast_sentence_segment-1.1.8-py3-none-any.whl.

File metadata

Download URL: fast_sentence_segment-1.1.8-py3-none-any.whl
Upload date: Dec 28, 2025
Size: 13.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for fast_sentence_segment-1.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`166093d743d74484a2634b4b9c852700f6a86b91286add1992de5f200ad4e33b`
MD5	`c2598e337f1025bc6049cd37b37e355b`
BLAKE2b-256	`eb28716817f107f8420a90f318bebfdf79f1a5e46e7267ad67ca78fe7a4d696e`

See more details on using hashes here.

fast-sentence-segment 1.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Fast Sentence Segmentation

Features

Installation

Quick Start

Usage

Basic Segmentation

Flattened Output

Direct Segmenter Access

API Reference

Why Nested Lists?

Requirements

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes