Sentence Segmentation with sequece tagging
Project description
DeepSegment: A sentence segmenter that actually works!
Note: For the original implementation please use the "master" branch of this repo.
The Demo for deepsegment (en) + deeppunct is available at http://bpraneeth.com/projects/deeppunct
Code documentation available at http://bpraneeth.com/docs
Installation:
pip install --upgrade deepsegment
Supported languages:
en - english (Trained on data from various sources)
fr - french (Only Tatoeba data)
it - italian (Only Tatoeba data)
Usage:
from deepsegment import DeepSegment
# The default language is 'en'
segmenter = DeepSegment('en')
segmenter.segment('I am Batman i live in gotham')
# ['I am Batman', 'i live in gotham']
Using with tf serving docker image
docker pull bedapudi6788/deepsegment_en:v2
docker run -d -p 8500:8500 bedapudi6788/deepsegment_en:v2
from deepsegment import DeepSegment
# The default language is 'en'
segmenter = DeepSegment('en', tf_serving=True)
segmenter.segment('I am Batman i live in gotham')
# ['I am Batman', 'i live in gotham']
Finetuning DeepSegment
Since one-size will never fit all, finetuning deepsegment's default models with your own data is encouraged.
from deepsegment import finetune, generate_data
x, y = generate_data(['my name', 'is batman', 'who are', 'you'], n_examples=10000)
vx, vy = generate_data(['my name', 'is batman'])
# NOTE: name, epochs, batch_size, lr are optional arguments.
finetune('en', x, y, vx, vy, name='finetuned_model_name', epochs=number_of_epochs, batch_size=batch_size, lr=learning_rate)
Using with a finetuned checkpoint
from deepsegment import DeepSegment
segmenter = DeepSegment('en', checkpoint_name='finetuned_model_name')
Training deepsegment on custom data: https://colab.research.google.com/drive/1CjYbdbDHX1UmIyvn7nDW2ClQPnnNeA_m
Similar Projects:
https://github.com/bminixhofer/nnsplit (with bindings for Python, Rust and Javascript.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for deepsegment-2.3.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04943b1f908d4e482ca5388f5f6d595772da86bda82a9cbbecd08d8a8ae3e039 |
|
MD5 | 3d7283d5c432ab769cada48b60bc0c5c |
|
BLAKE2b-256 | c7a4dccb2a9356db844d7380d97fbaaa865f5ab8a929c7ffd2d77216367d30b4 |