A tool to separate truncated text.
Cut your unspaced (or 'too spaced') long texts.
import wordslicer model = wordslicer.train('train_file') text = open('input_file', 'r').read() text = wordslicer.separate(model, text) # or wordslicer.join(model, text) save('output_file', text)
For an input of:
- 161029 words to train
- 1000 lines to separate
- Text with 36889 words
- Time: real 0m1,368s
>>> wordslicer.separate(model, "Boromirhesitatedforasecond.'Yes,andno,'heansweredslowly.'Yes:Ifoundhimsomewayupthehill,andIspoketohim.IurgedhimtocometoMinasTirithandnottogoeast.Igrewangryandheleftme.Hevanished.Ihaveneverseensuchathinghappenbefore.thoughIhaveheardofitintales.HemusthaveputtheRingon.Icouldnotfindhimagain.Ithoughthewouldreturntoyou.'") Boromir hesitated for a second. 'Yes, and no,' he answered slowly. 'Yes: I found him some way up the hill, and I spoke to him. I urged him to come to Minas Tirith and not to go east. I grew angry and he left me. He vanished. I have never seen such a thing happen before. though I have heard of it in tales. He must have put the Ring on. I could not find him again. I though the would return to you.'
How to Install
pip3 install wordslicer
Train your model: with the training ability, this package works with every language.
Evaluate your model: check if your training text is good enough for your input text:
This project was inspired by Generic Human on http://stackoverflow.com/a/11642687/2449774 . Thank you!
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size wordslicer-0.1.0-py3-none-any.whl (4.3 kB)||File type Wheel||Python version py3||Upload date||Hashes View hashes|
|Filename, size wordslicer-0.1.0.tar.gz (3.3 kB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for wordslicer-0.1.0-py3-none-any.whl