A package for sentence splitting using a pre-trained transformer model.

These details have not been verified by PyPI

Project description

Sentence Splitter

A Python package for sentence splitting using a pre-trained transformer model.

Description

Sentence Splitter is a Python package that provides accurate sentence segmentation using a transformer-based token classification model. The model is automatically downloaded from Hugging Face Hub on first use and cached locally for future use. It's designed to handle long texts efficiently and supports GPU acceleration if available.

Features

Transformer-Based Model: Leverages a pre-trained transformer model for high-accuracy sentence splitting.
Automatic Model Download: The model is automatically downloaded from Hugging Face Hub on first use and cached locally.
Easy to Use: Simple API for quick integration into your projects.
Handles Long Texts: Efficiently processes long texts by splitting them into manageable chunks.
GPU Acceleration: Automatically utilizes CUDA if available for faster processing.

Installation

Install the package via pip:

pip install iges-sentence-splitter

Requirements

Python 3.6 or higher
torch
transformers

Note: These dependencies will be installed automatically when you install the package via pip.

First Use

On first use, the model (~1GB) will be automatically downloaded from Hugging Face Hub and cached locally in ~/.cache/huggingface/. Subsequent uses will load the model from cache instantly.

Usage

Basic Example

from sentence_splitter.splitter import SentenceSplitter

# Initialize the splitter
splitter = SentenceSplitter()

# Input text
text = "This is a test. Here is another sentence. And yet another one!"

# Get sentences
sentences = splitter.split(text)

print(sentences)

Output:

['This is a test.', 'Here is another sentence.', 'And yet another one!']

Processing Long Texts

The split method can handle long texts by splitting them into chunks. You can adjust the parameters as needed:

sentences = splitter.split(
    text,
    max_seq_len=512,   # Maximum sequence length for each chunk
    stride=100,        # Overlap between chunks to preserve context
    batch_size=4       # Number of chunks to process at once
)

API Reference

`SentenceSplitter`

A class for splitting text into sentences using a pre-trained transformer model.

Initialization

splitter = SentenceSplitter(device=None, efficient_mode=False)

Parameters:
- device (str, optional): The device to run the model on ('cuda' or 'cpu'). Defaults to 'cuda' if available, otherwise 'cpu'.
- efficient_mode (bool, optional): Whether to run the model in 8-bit precision for faster computing

Methods

split(text, max_seq_len=512, stride=100, batch_size=4)

Splits the input text into sentences.
- Parameters:
  - text (str): The text to split.
  - max_seq_len (int, optional): Maximum sequence length for the model. Defaults to 512.
  - stride (int, optional): Number of tokens to overlap between chunks. Defaults to 100.
  - batch_size (int, optional): Number of chunks to process simultaneously. Defaults to 4.
- Returns:
  - List[str]: A list of sentences.

How It Works

The package uses a token classification model that labels each token as:

B: Beginning of a sentence.
E: End of a sentence.
I: Inside a sentence.

By processing the tokens and their predicted labels, the splitter reconstructs the sentences accurately, even in complex texts.

Example: Splitting Complex Text

text = """
Despite the rain, the match continued. Players were determined; fans were cheering. 
"Unbelievable!" shouted the commentator. It's a night to remember.
"""

sentences = splitter.split(text)

for i, sentence in enumerate(sentences, 1):
    print(f"Sentence {i}: {sentence}")

Output:

Sentence 1: Despite the rain, the match continued.
Sentence 2: Players were determined; fans were cheering.
Sentence 3: "Unbelievable!" shouted the commentator.
Sentence 4: It's a night to remember.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Kathryn Chapman
Email: kathryn.chapman@iges.com

Acknowledgments

Hugging Face Transformers for the transformer models.
PyTorch for the deep learning framework.

Contact

For any questions or suggestions, feel free to reach out via email.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.2.1

Dec 8, 2025

0.2.0

Dec 8, 2025

0.1.15

Jan 8, 2025

0.1.14

Dec 4, 2024

0.1.13

Nov 22, 2024

0.1.12

Nov 22, 2024

0.1.11

Nov 22, 2024

0.1.10

Nov 22, 2024

0.1.9

Nov 22, 2024

0.1.8

Nov 20, 2024

0.1.7

Nov 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iges_sentence_splitter-0.2.1.tar.gz (5.6 kB view details)

Uploaded Dec 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iges_sentence_splitter-0.2.1-py3-none-any.whl (6.4 kB view details)

Uploaded Dec 8, 2025 Python 3

File details

Details for the file iges_sentence_splitter-0.2.1.tar.gz.

File metadata

Download URL: iges_sentence_splitter-0.2.1.tar.gz
Upload date: Dec 8, 2025
Size: 5.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.8

File hashes

Hashes for iges_sentence_splitter-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`11d7bffcc1de1ff1e73aaa5af4f5cf9d7e6a1b942478c58f9b95fc21b4fba314`
MD5	`2cef37e06b41d18be86759523595c22a`
BLAKE2b-256	`760dcce1030b76bd98f60610611086f0acbbc9e07bea0cbedb9ba878ffca98e2`

See more details on using hashes here.

File details

Details for the file iges_sentence_splitter-0.2.1-py3-none-any.whl.

File metadata

Download URL: iges_sentence_splitter-0.2.1-py3-none-any.whl
Upload date: Dec 8, 2025
Size: 6.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.8

File hashes

Hashes for iges_sentence_splitter-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a41df1e7ff02b4c1d70cc39f0ca732634fcf7b472376c53cbc18dcd37f751756`
MD5	`e33e30293c0fe5eaa5055e0186e9f179`
BLAKE2b-256	`0ce6904d95eea0cd787526def2ba6639ce08129410a6fab7267c99f04eee006e`

See more details on using hashes here.

iges-sentence-splitter 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Sentence Splitter

Description

Features

Installation

Requirements

First Use

Usage

Basic Example

Processing Long Texts

API Reference

SentenceSplitter

Initialization

Methods

How It Works

Example: Splitting Complex Text

License

Author

Acknowledgments

Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`SentenceSplitter`