Fast, robust sentence splitting with bindings for Python, Rust and Javascript.
Reason this release was yanked:
unsupported legacy version
Project description
NNSplit Python Bindings
Fast, robust sentence splitting with bindings for Python, Rust and Javascript and pretrained models for English and German.
Installation
NNSplit has PyTorch as the only dependency.
Install it with pip: pip install nnsplit
Usage
>>> from nnsplit import NNSplit
>>> splitter = NNSplit("en")
# NNSplit does not depend on proper punctuation and casing to split sentences
>>> splitter.split(["This is a test This is another test."])
[[[Token(text='This', whitespace=' '),
Token(text='is', whitespace=' '),
Token(text='a', whitespace=' '),
Token(text='test', whitespace=' ')],
[Token(text='This', whitespace=' '),
Token(text='is', whitespace=' '),
Token(text='another', whitespace=' '),
Token(text='test', whitespace=''),
Token(text='.', whitespace='')]]]
Models for German (NNSplit("de")
) and English (NNSplit("en")
) come prepackaged with NNSplit. Alternatively, you can also load your own model:
import torch
model = torch.jit.load("/path/to/your/model.pt") # a regular nn.Module works too
splitter = NNSplit(model)
See train.ipynb
for the code used to train the pretrained models.
Development
NNSplit uses Poetry for dependency management. I made a small Makefile
to automate some steps. Take a look at the Makefile
and run make install
, make build
, make test
to install, build and test the library, respectively.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.