Russian Paraphrasers (based on ru-gpt, mt5)
Project description
Russian Paraphrasers
The library for Russian paraphrase generation. Paraphrase generation is an increasingly popular task in NLP that can be used in many areas:
- style transfer:
- translation from rude to polite
- translation from professional to simple language
- data augmentation: increasing the number of examples for training ML-models
- increasing the stability of ML-models: training models on a wide variety of examples, in different styles, with different sentiment, but the same meaning / intent of the user
Install
pip install --upgrade pip
pip install -r requirements.txt
pip install russian_paraphrasers
Requirements.txt:
sentence-transformers==0.4.0
transformers>=4.0.1
git+https://github.com/Maluuba/nlg-eval.git@master
Usage
- First, import one of the models and set general parameters for your paraphraser:
from russian_paraphrasers import GPTParaphraser
paraphraser = GPTParaphraser(model_name="gpt2", range_cand=False, make_eval=False)
from russian_paraphrasers import Mt5Paraphraser
paraphraser = Mt5Paraphraser(model_name="mt5-base", range_cand=False, make_eval=False)
You can choose 1) to filter candidates or not 2) to add some evaluation of best candidates or all n
samples.
Arguments:
- model_name:
mt5-small
,mt5-base
,mt5-large
,gpt2
- range_cand:
True/False
- make_eval:
True/False
- Pass sentence (obligatory) and parameters for generating to generate function and see the results.
sentence = "Мама мыла раму."
results = paraphraser.generate(
sentence, n=10, temperature=1,
top_k=10, top_p=0.9,
max_length=100, repetition_penalty=1.5
)
Results for one sentence look like this:
{'average_metrics': {'Bleu_1': 0.06666666665333353,
'Bleu_2': 2.3570227263379004e-09,
'Bleu_3': 8.514692649183842e-12,
'Bleu_4': 5.665278056606597e-13,
'ROUGE_L': 0.07558859975216851},
'best_candidats': ['В чём цель существования человека?',
'Для чего нужна жизнь?',
'Что такое жизнь в смысле смысла ее существования, и зачем '
'она нужна человеку.'],
'predictions': ['В чём счастье людей, проживающих в мире сегодня',
'В чём счастье человека?)',
'Для чего нужна жизнь и какова цель ее существования?',
'Что означает фраза в том чтобы жить жизнью?',
'В чём ценность человеческой Жизни?',
'В чём счастье людей в мире? и т. д.',
'Зачем нужна жизнь и что в ней главное докуменция дл',
'В чём цель существования человека?',
'Что такое жизнь в смысле смысла ее существования, и зачем '
'она нужна человеку.',
'Для чего нужна жизнь?']
}
Models
All models were fine-tuned on the same dataset (see below) and uploaded to hugging_face. Available models:
- rugpt2-large trained by Sberbank team https://github.com/sberbank-ai/ru-gpts
- mt5-small
- mt5-base
- mt5-large
To be continued... =)
Dataset
All models were finetuned on the dataset based on two parts:
- part of the ParaPhraser data, about 200k filtered examples
- filtered questions to chatbots and filtered subtitles from here
The dataset will be available soon as well as the article with all the details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file russian_paraphrasers-0.0.1.tar.gz
.
File metadata
- Download URL: russian_paraphrasers-0.0.1.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0.post20201221 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 462ab795865fd09abd011f8119344e67df859603cdca29ae7b5dd92fcc7f6a0e |
|
MD5 | 0fcb09f85daf44c069078fed1cdfc4fd |
|
BLAKE2b-256 | 5454e23efe9880173f342c9ac42903ed7802b8be1847100324cd1d93f2d1389a |
File details
Details for the file russian_paraphrasers-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: russian_paraphrasers-0.0.1-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0.post20201221 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9543f393a985056afc18dc587081de01fecf26c0777cc62aa4f8c502f708de1 |
|
MD5 | 6a83ad1aa90cb2eaedd943819f8b27b1 |
|
BLAKE2b-256 | 6967c396f0213be604654c6b9152b16ff0b58228acfc7cd938a926198f94ebca |