Skip to main content

Russian Paraphrasers (based on ru-gpt, mt5)

Project description

Russian Paraphrasers

The library for Russian paraphrase generation. Paraphrase generation is an increasingly popular task in NLP that can be used in many areas:

  • style transfer:
    • translation from rude to polite
    • translation from professional to simple language
  • data augmentation: increasing the number of examples for training ML-models
  • increasing the stability of ML-models: training models on a wide variety of examples, in different styles, with different sentiment, but the same meaning / intent of the user

Install

pip install --upgrade pip
pip install -r requirements.txt
pip install russian_paraphrasers

Requirements.txt:

sentence-transformers==0.4.0
transformers>=4.0.1
git+https://github.com/Maluuba/nlg-eval.git@master

Usage

Open In Colab

  1. First, import one of the models and set general parameters for your paraphraser:
from russian_paraphrasers import GPTParaphraser

paraphraser = GPTParaphraser(model_name="gpt2", range_cand=False, make_eval=False)
from russian_paraphrasers import Mt5Paraphraser

paraphraser = Mt5Paraphraser(model_name="mt5-base", range_cand=False, make_eval=False)

You can choose 1) to filter candidates or not 2) to add some evaluation of best candidates or all n samples.

Arguments:

  • model_name: mt5-small, mt5-base, mt5-large, gpt2
  • range_cand: True/False
  • make_eval: True/False
  1. Pass sentence (obligatory) and parameters for generating to generate function and see the results.
sentence = "Мама мыла раму."
results = paraphraser.generate(
    sentence, n=10, temperature=1, 
    top_k=10, top_p=0.9, 
    max_length=100, repetition_penalty=1.5
)

Results for one sentence look like this:

{'average_metrics': {'Bleu_1': 0.06666666665333353,
                     'Bleu_2': 2.3570227263379004e-09,
                     'Bleu_3': 8.514692649183842e-12,
                     'Bleu_4': 5.665278056606597e-13,
                     'ROUGE_L': 0.07558859975216851},
 'best_candidats': ['В чём цель существования человека?',
                    'Для чего нужна жизнь?',
                    'Что такое жизнь в смысле смысла ее существования, и зачем '
                    'она нужна человеку.'],
 'predictions': ['В чём счастье людей, проживающих в мире сегодня',
                 'В чём счастье человека?)',
                 'Для чего нужна жизнь и какова цель ее существования?',
                 'Что означает фраза в том чтобы жить жизнью?',
                 'В чём ценность человеческой Жизни?',
                 'В чём счастье людей в мире? и т. д.',
                 'Зачем нужна жизнь и что в ней главное докуменция дл',
                 'В чём цель существования человека?',
                 'Что такое жизнь в смысле смысла ее существования, и зачем '
                 'она нужна человеку.',
                 'Для чего нужна жизнь?']
}

Models

All models were fine-tuned on the same dataset (see below) and uploaded to hugging_face. Available models:

To be continued... =)

Dataset

All models were finetuned on the dataset based on two parts:

  1. part of the ParaPhraser data, about 200k filtered examples
  2. filtered questions to chatbots and filtered subtitles from here

The dataset will be available soon as well as the article with all the details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

russian_paraphrasers-0.0.1.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

russian_paraphrasers-0.0.1-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file russian_paraphrasers-0.0.1.tar.gz.

File metadata

  • Download URL: russian_paraphrasers-0.0.1.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0.post20201221 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.3

File hashes

Hashes for russian_paraphrasers-0.0.1.tar.gz
Algorithm Hash digest
SHA256 462ab795865fd09abd011f8119344e67df859603cdca29ae7b5dd92fcc7f6a0e
MD5 0fcb09f85daf44c069078fed1cdfc4fd
BLAKE2b-256 5454e23efe9880173f342c9ac42903ed7802b8be1847100324cd1d93f2d1389a

See more details on using hashes here.

File details

Details for the file russian_paraphrasers-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: russian_paraphrasers-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.0.post20201221 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.3

File hashes

Hashes for russian_paraphrasers-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c9543f393a985056afc18dc587081de01fecf26c0777cc62aa4f8c502f708de1
MD5 6a83ad1aa90cb2eaedd943819f8b27b1
BLAKE2b-256 6967c396f0213be604654c6b9152b16ff0b58228acfc7cd938a926198f94ebca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page