Text Generation Model
Project description
textgen
textgen, Text Generation models. 文本生成,包括:非核心词替换,seq2seq,ernie-gen,bert,xlnet,gpt2等模型实现,开箱即用。
Features
非核心词替换
基于Google提出的UDA算法,将文本中一定比例的不重要词替换为同义词,从而产生新的文本。
Seq2Seq
基于encoder-decoder结构,序列到序列生成新的文本。
Install
pip3 install textgen
or
git clone https://github.com/shibing624/text-generation.git
cd text-generation
python3 setup.py install
Usage
- download pretrained vector file
以下词向量,任选一个:
轻量版腾讯词向量,二进制,111MB放到 ~/.text2vec/datasets/light_Tencent_AILab_ChineseEmbedding.bin
腾讯词向量, 6.78G放到: ~/.text2vec/datasets/Tencent_AILab_ChineseEmbedding.txt
- download pretrained language model file
bert模型
- text generation base rule
import textgen
a = '晚上一个人好孤单,想找附近人陪陪我'
b = textgen.rule(a)
print(b)
output:
晚上一个人好寂寞,想找附近人陪伴我
- text generation base seq2seq
import textgen
a = '你这么早就睡呀,'
b = textgen.seq2seq(a)
print(b)
output:
你这么早就睡呀,我还没写完作业呢,你陪我看看这个题怎么写吧。
- text generation base ernie-gen
import textgen
a = '你这么早就睡呀,'
b = textgen.erniegen(a)
print(b)
output:
你这么早就睡呀,我还没写完作业呢,你陪我看看这个题怎么写吧。求求你了!
TODO
- seq2seq
- bert
- ernie-gen
- xlnet
License
Apache License 2.0
Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
textgen-0.0.1.tar.gz
(7.6 kB
view hashes)