Word2Vec implentation with Tensorflow Estimators and Datasets
Project description
Word2Vec
This is a re-implementation of Word2Vec relying on Tensorflow Estimators and Datasets
Install
Via pip
pip3 install tf-word2vec
After a git clone
python3 setup.py install
Get data
You can download a sample of the English Wikipedia here:
wget http://129.194.21.122/~kabbach/enwiki.20190120.sample10.0.balanced.txt.7z
Train Word2Vec
w2v train \
--data /absolute/path/to/enwiki.20190120.sample10.0.balanced.txt \
--outputdir /absolute/path/to/word2vec/models \
--alpha 0.025 \
--neg 5 \
--window 2 \
--epochs 5 \
--size 300 \
--min-count 50 \
--sample 1e-5 \
--train-mode skipgram \
--t-num-threads 20 \
--p-num-threads 25 \
--keep-checkpoint-max 3 \
--batch 1 \
--shuffling-buffer-size 10000 \
--save-summary-steps 10000 \
--save-checkpoints-steps 100000 \
--log-step-count-steps 10000
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tf-word2vec-0.1.2.tar.gz
(31.3 kB
view hashes)