Add a short description here!
Project description
Implement some keyphrase generation algorithm
Description
Implemented Paper
CopyRNN
ToDo List
CopyCNN
CopyTransformer
Usage
required files (4 files in total)
vocab_file: word line by line (don’t with index!!!!)
this paper proposes
training, valid and test file
data format for training, valid and test
json line format, every line is a dict:
{'tokens': ['this', 'paper', 'proposes', 'using', 'virtual', 'reality', 'to', 'enhance', 'the', 'perception', 'of', 'actions', 'by', 'distant', 'users', 'on', 'a', 'shared', 'application', '.', 'here', ',', 'distance', 'may', 'refer', 'either', 'to', 'space', '(', 'e.g.', 'in', 'a', 'remote', 'synchronous', 'collaboration', ')', 'or', 'time', '(', 'e.g.', 'during', 'playback', 'of', 'recorded', 'actions', ')', '.', 'our', 'approach', 'consists', 'in', 'immersing', 'the', 'application', 'in', 'a', 'virtual', 'inhabited', '3d', 'space', 'and', 'mimicking', 'user', 'actions', 'by', 'animating', 'avatars', '.', 'we', 'illustrate', 'this', 'approach', 'with', 'two', 'applications', ',', 'the', 'one', 'for', 'remote', 'collaboration', 'on', 'a', 'shared', 'application', 'and', 'the', 'other', 'to', 'playback', 'recorded', 'sequences', 'of', 'user', 'actions', '.', 'we', 'suggest', 'this', 'could', 'be', 'a', 'low', 'cost', 'enhancement', 'for', 'telepresence', '.'] , 'keyphrases': [['telepresence'], ['animation'], ['avatars'], ['application', 'sharing'], ['collaborative', 'virtual', 'environments']]}
Training
download the kp20k
mkdir data mkdir data/raw mkdir data/raw/kp20k_new # !! please unzip kp20k data put the files into above folder manually python -m nltk.downloader punkt bash scripts/prepare_kp20k.sh bash scripts/train_copyrnn_kp20k.sh # start tensorboard # enter the experiment result dir, suffix is time that experiment starts cd data/kp20k/copyrnn_kp20k_basic-20191212-080000 # start tensorboard services tenosrboard --bind_all --logdir logs --port 6006
Notes
- compared with the original
seq2seq-keyphrase-pytorch
- fix the implementation error:
copy mechanism
train and inference are not correspond (training doesn't have input feeding and inference has input feeding)
easy data preparing
tensorboard support
faster beam search (6x faster used cpu and more than 10x faster used gpu)
- compared with the original
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
deep-keyphrase-0.0.6.tar.gz
(36.7 kB
view details)
File details
Details for the file deep-keyphrase-0.0.6.tar.gz
.
File metadata
- Download URL: deep-keyphrase-0.0.6.tar.gz
- Upload date:
- Size: 36.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85eb50b481b3e74a3c952a22f4eadfa3874c74d4937a672a02c15aca6083b5bc |
|
MD5 | 9fc3f1aba34c28eee12df5fae3f99502 |
|
BLAKE2b-256 | e4657903e109c6372dc09b4188045280ccbacffb4baf4b5ab59ffce374eb07ae |