pysbd (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box across many languages.
pySBD: Python Sentence Boundary Disambiguation (SBD)
pySBD - python Sentence Boundary Disambiguation (SBD) - is a rule-based sentence boundary detection module that works out-of-the-box.
This project is a direct port of ruby gem - Pragmatic Segmenter which provides rule-based sentence boundary detection.
pip install pysbd
- Currently pySBD supports only English language. Support for more languages will be released soon.
import pysbd text = "My name is Jonas E. Smith. Please turn to p. 55." seg = pysbd.Segmenter(language="en", clean=False) print(seg.segment(text)) # ['My name is Jonas E. Smith.', 'Please turn to p. 55.']
pysbdas a spaCy pipeline component. (recommended)
Please refer to example pysbd_as_spacy_component.py
- Use pysbd through entrypoints
import spacy from pysbd.utils import PySBDFactory nlp = spacy.blank('en') # explicitly adding component to pipeline # (recommended - makes it more readable to tell what's going on) nlp.add_pipe(PySBDFactory(nlp)) # or you can use it implicitly with keyword # pysbd = nlp.create_pipe('pysbd') # nlp.add_pipe(pysbd) doc = nlp('My name is Jonas E. Smith. Please turn to p. 55.') print(list(doc.sents)) # [My name is Jonas E. Smith., Please turn to p. 55.]
If you find a text that is incorrectly segmented using pySBD, please submit an issue.
- Fork it ( https://github.com/nipunsadvilkar/pySBD/fork )
- Create your feature branch (
git checkout -b my-new-feature)
- Commit your changes (
git commit -am 'Add some feature')
- Push to the branch (
git push origin my-new-feature)
- Create a new Pull Request
This project wouldn't be possible without the great work done by Pragmatic Segmenter team.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size pysbd-0.2.3-py3-none-any.whl (24.5 kB)||File type Wheel||Python version py3||Upload date||Hashes View hashes|