A sentence paraphraser based on dependency syntax and word embeddings

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

dependency-paraphraser

A sentence paraphraser based on dependency parsing and word embedding similarity.

How the paraphraser works:

Create a random projection of the dependency tree
Replace several words with similar ones

The basic usage (for Russian language) is based on Natasha library:

pip install dependency-paraphraser natasha

import dependency_paraphraser.natasha
import random
random.seed(42)
text = 'каждый охотник желает знать где сидит фазан'
for i in range(3):
    print(dependency_paraphraser.natasha.paraphrase(text, tree_temperature=2))
# желает знать сидит фазан где каждый охотник
# каждый охотник желает знать где фазан сидит
# знать где фазан сидит каждый охотник желает

You can provide your own w2v model to replace words with similar ones:

import compress_fasttext
small_model = compress_fasttext.models.CompressedFastTextKeyedVectors.load(
    'https://github.com/avidale/compress-fasttext/releases/download/v0.0.1/ft_freqprune_100K_20K_pq_100.bin'
)
random.seed(42)
for i in range(3):
    print(dependency_paraphraser.natasha.paraphrase(text, w2v=small_model, p_rep=0.8, min_sim=0.55))
# стремится каждый охотник знать рябчик где усаживается
# каждый охотник хочет узнать фазан где просиживает
# каждый охотник хочет узнать фазан где восседает

Alternatively, you can expand and use the w2v model from Natasha (aka navec):

navec_model = dependency_paraphraser.natasha.emb.as_gensim
random.seed(42)
for i in range(3):
    print(dependency_paraphraser.natasha.paraphrase(text, w2v=navec_model, p_rep=0.5, min_sim=0.55))
# желает каждый охотник помнить фазан где лежит
# каждый охотник желает знать фазан где сидит
# каждый охотник оставляет понять где фазан лежит

For other languages, one way to use this paraphraser is with the UDPipe library

pip install dependency-paraphraser ufal.udpipe pyconll

import dependency_paraphraser.udpipe
path = 'english-ewt-ud-2.5-191206.udpipe'
pipe = dependency_paraphraser.udpipe.Model(path)
projector = dependency_paraphraser.udpipe.en_udpipe_projector

text = 'in April 2012 they released the videoclip for a new single entitled Giorgio Mastrota'
for i in range(3):
    print(dependency_paraphraser.udpipe.paraphrase(text, pipe, projector=projector, tree_temperature=1))
# they released the videoclip in April 2012 for a new entitled Mastrota single Giorgio
# they released in April 2012 the videoclip for a entitled single new Giorgio Mastrota
# they released the videoclip in April 2012 for a new single Giorgio Mastrota entitled

Projectors (models for projecting dependency trees into a flat sentence) can be trained for any language, if you have a corpus of unlabeled sentences and a syntax parser to label them:

import dependency_paraphraser.udpipe
import dependency_paraphraser.train_projector
parser = dependency_paraphraser.udpipe.Model(path_to_your_model)

sents = dependency_paraphraser.train_projector.label_udpipe_sentences(
    texts=your_corpus,
    model=parser,
)
projector = dependency_paraphraser.train_projector.train_projector(sents)

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.5

Sep 21, 2021

0.0.4

Jun 23, 2021

0.0.3

May 16, 2020

0.0.2

May 16, 2020

0.0.1

May 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dependency-paraphraser-0.0.5.tar.gz (58.8 kB view details)

Uploaded Sep 21, 2021 Source

File details

Details for the file dependency-paraphraser-0.0.5.tar.gz.

File metadata

Download URL: dependency-paraphraser-0.0.5.tar.gz
Upload date: Sep 21, 2021
Size: 58.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/3.10.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.9.6

File hashes

Hashes for dependency-paraphraser-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`3697cdbc600ca5f77edbe5b5698ea1748bb01d9463d890d9041083868f9d253a`
MD5	`f6e6e01f8ac1d4526c4034bdaab4609c`
BLAKE2b-256	`fec6bc1b4e7faf1196dfe278c3bd778df7991ee8222230de803b65e29af7f8cb`

See more details on using hashes here.

dependency-paraphraser 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dependency-paraphraser

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes