A small seq2seq punctuator tool based on DistilBERT
Project description
Distilbert-punctuator
Introduction
Distilbert-punctuator is a python package provides a bert-based punctuator (fine-tuned model of pretrained huggingface DistilBertForTokenClassification
) with following three components:
- data process: funcs for processing user's data to prepare for training. If user perfer to fine-tune the model with his/her own data.
- training: training pipeline. User can fine-tune his/her own punctuator with the pipeline
- inference: easy-to-use interface for user to use trained punctuator. If user doesn't want to train a punctuator himself/herself, a pre-fined-tuned model from huggingface model hub
Qishuai/distilbert_punctuator_en
can be used when launching the inference
Data Process
Component for pre-processing the training data. To use this component, please install as pip install distilbert-punctuator[data_process]
The package is providing a simple pipeline for you to generate NER
format training data.
Example
examples/data_sample.py
Train
Component for providing a training pipeline for fine-tuning a pretrained DistilBertForTokenClassification
model from huggingface
.
Example
examples/train_sample.py
Inference
Component for providing an inference interface for user to use punctuator.
Architecture
+----------------------+ (child process)
| user application | +-------------------+
+ + <---------->| punctuator server |
| +inference object | +-------------------+
+----------------------+
The punctuator will be deployed in a child process which communicates with main process through pipe connection.
Therefore user can initialize an inference object and call its punctuation
function when needed. The punctuator will never block the main process unless doing punctuation.
There is a graceful shutdown
methodology for the punctuator, hence user dosen't need to worry about the shutting-down.
Example
examples/inference_sample.py
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for distilbert-punctuator-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | deca6fcd22306b8b1d4800e6e32698eb0a8eea57353b97a07a7227d4bc02455b |
|
MD5 | ab257ede9e19968a21e7bac362bcea9e |
|
BLAKE2b-256 | 5f8aea90fca4e8eba912d8db2dc71ab11d0970cbf4ab1c52a10dc7bbca2678ee |
Hashes for distilbert_punctuator-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8e049684fb147e7604e97a82dde2632e3e7f2a4326bd0ce62fa674150e631ac |
|
MD5 | 887ed8989039fc5f9a754c93abfbdebf |
|
BLAKE2b-256 | 22625bef242f7ed4123f5ede1af8cc64778be98c5f4d3732ee6630a1eea2bbeb |