Sentence Segmentation with sequece tagging
Project description
DeepSegment: A sentence segmenter that actually works!
Note: For the original implementation please use the "master" branch of this repo.
The Demo for deepsegment (en) + deeppunct is available at http://bpraneeth.com/projects/deeppunct
Code documentation available at http://bpraneeth.com/docs
Installation:
pip install --upgrade deepsegment
Supported languages:
en - english (Trained on data from various sources)
fr - french (Only Tatoeba data)
it - italian (Only Tatoeba data)
Usage:
from deepsegment import DeepSegment
# The default language is 'en'
segmenter = DeepSegment('en')
segmenter.segment('I am Batman i live in gotham')
# ['I am Batman', 'i live in gotham']
Using with tf serving docker image
docker pull bedapudi6788/deepsegment_en:v2
docker run -d -p 8500:8500 bedapudi6788/deepsegment_en:v2
from deepsegment import DeepSegment
# The default language is 'en'
segmenter = DeepSegment('en', tf_serving=True)
segmenter.segment('I am Batman i live in gotham')
# ['I am Batman', 'i live in gotham']
Finetuning DeepSegment
Since one-size will never fit all, finetuning deepsegment's default models with your own data is encouraged.
from deepsegment import finetune, generate_data
x, y = generate_data(['my name', 'is batman', 'who are', 'you'], n_examples=10000)
vx, vy = generate_data(['my name', 'is batman'])
# NOTE: name, epochs, batch_size, lr are optional arguments.
finetune('en', x, y, vx, vy, name='finetuned_model_name', epochs=number_of_epochs, batch_size=batch_size, lr=learning_rate)
Using with a finetuned checkpoint
from deepsegment import DeepSegment
segmenter = DeepSegment('en', checkpoint_name='finetuned_model_name')
Training deepsegment on custom data: https://colab.research.google.com/drive/1CjYbdbDHX1UmIyvn7nDW2ClQPnnNeA_m
Similar Projects:
https://github.com/bminixhofer/nnsplit (with bindings for Python, Rust and Javascript.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file deepsegment-2.3.1.tar.gz
.
File metadata
- Download URL: deepsegment-2.3.1.tar.gz
- Upload date:
- Size: 8.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b93341d0a8ae82fc69cf53647562d575113d0da421ebed097faddca2040a70e8 |
|
MD5 | d17d0ffe617c2f82c2c48403e290237e |
|
BLAKE2b-256 | 65c9f7e03bf5aec372951d9086ddd538e51a372f8e10b7f1b7816fcd7acb2207 |
File details
Details for the file deepsegment-2.3.1-py2.py3-none-any.whl
.
File metadata
- Download URL: deepsegment-2.3.1-py2.py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04943b1f908d4e482ca5388f5f6d595772da86bda82a9cbbecd08d8a8ae3e039 |
|
MD5 | 3d7283d5c432ab769cada48b60bc0c5c |
|
BLAKE2b-256 | c7a4dccb2a9356db844d7380d97fbaaa865f5ab8a929c7ffd2d77216367d30b4 |