Punctuation restoration library
Project description
Punctuation restoration
Adds punctuation and capitalization for a given text without punctuation.
Works on Danish, German and English.
Models hosted on huggingface! ❤️ 🤗
Status with python 3.8
Installation
pip install punctfix
Usage
Its quite simple to use!
>>> from punctfix import PunctFixer
>>> fixer = PunctFixer(language="da")
>>> example_text = "mit navn det er rasmus og jeg kommer fra firmaet alvenir det er mig som har trænet denne lækre model"
>>> print(fixer.punctuate(example_text))
'Mit navn det er Rasmus og jeg kommer fra firmaet Alvenir. Det er mig som har trænet denne lækre model.'
>>> example_text = "en dag bliver vi sku glade for, at vi nu kan sætte punktummer og kommaer i en sætning det fungerer da meget godt ikke"
>>> print(fixer.punctuate(example_text))
'En dag bliver vi sku glade for, at vi nu kan sætte punktummer og kommaer i en sætning. Det fungerer da meget godt, ikke?'
Parameters for PunctFixer
- Pass
device="cuda"
ordevice="cpu"
to indicate where to run inference. Default isdevice="cpu"
- To handle long sequences, we use a chunk size and an overlap. These can be modified. For higher speed but
lower acuracy use a chunk size of 150-200 and very little overlap i.e. 5-10. These parameters are set with
default values
word_chunk_size=100
,word_overlap=70
which makes it run a bit slow. The default parameters will be updated when we have some results on variations. - Supported languages are "en" for English, "da" for Danish and "de" for German. Default is
language="da"
.
Contribute
If you encounter issues, feel free to open issues in the repo and then we will fix. Even better, create issue and then a PR that fixes the issue! ;-)
Happy punctuating!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
punctfix-0.10.0.tar.gz
(12.9 kB
view details)
Built Distribution
punctfix-0.10.0-py3-none-any.whl
(13.9 kB
view details)
File details
Details for the file punctfix-0.10.0.tar.gz
.
File metadata
- Download URL: punctfix-0.10.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d9cd2161002a4cb2cf609ca70c9bf9a9eb053c3b138cc7751e7e2027e2e7e30 |
|
MD5 | 04b01fbceea80f84bea37574e741f768 |
|
BLAKE2b-256 | 048571ba504256c328d49d17542acd91404903306aeb16ce84921e159866d4e6 |
File details
Details for the file punctfix-0.10.0-py3-none-any.whl
.
File metadata
- Download URL: punctfix-0.10.0-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b1df61dbe38cb3886ab3a23bc046df26a013ed5689de9a9de62ab92b24c6cd5 |
|
MD5 | 80b47e3ff2d50b1891bb235a0a30e29e |
|
BLAKE2b-256 | 26388f096b729bb4071b79bf9b9a825339af4235329abce535e8f7222bb6a981 |