Skip to main content

Word2Vec implentation with Tensorflow Estimators and Datasets

Project description

Word2Vec

GitHub release PyPI release Build MIT License

This is a re-implementation of Word2Vec relying on Tensorflow Estimators and Datasets.

Works with python >= 3.6 and Tensorflow v2.0.

Install

via pip:

pip3 install tf-word2vec

or, after a git clone:

python3 setup.py install

Get data

You can download a sample of the English Wikipedia here:

wget http://129.194.21.122/~kabbach/enwiki.20190120.sample10.0.balanced.txt.7z

Train Word2Vec

w2v train \
  --data /absolute/path/to/enwiki.20190120.sample10.0.balanced.txt \
  --outputdir /absolute/path/to/word2vec/models \
  --alpha 0.025 \
  --neg 5 \
  --window 2 \
  --epochs 5 \
  --size 300 \
  --min-count 50 \
  --sample 1e-5 \
  --train-mode skipgram \
  --t-num-threads 20 \
  --p-num-threads 25 \
  --keep-checkpoint-max 3 \
  --batch 1 \
  --shuffling-buffer-size 10000 \
  --save-summary-steps 10000 \
  --save-checkpoints-steps 100000 \
  --log-step-count-steps 10000

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tf-word2vec-1.0.7.tar.gz (31.6 kB view details)

Uploaded Source

File details

Details for the file tf-word2vec-1.0.7.tar.gz.

File metadata

  • Download URL: tf-word2vec-1.0.7.tar.gz
  • Upload date:
  • Size: 31.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.5

File hashes

Hashes for tf-word2vec-1.0.7.tar.gz
Algorithm Hash digest
SHA256 eae6ef7988d0718204091edd56a10a6be85142dd571dc1e89072936b99e692a2
MD5 cf4eaa03c807f0fda5f8951ba41a68d7
BLAKE2b-256 5b7cf08e9ffb6651bfaa38ca7e28bba30cfdc1158c1a276665400f36f777e034

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page