Skip to main content

Word Vectors

Project description

Word Vectors

PyPi Version Actions Status Code style: black

A fast light library for loading word vectors.

File Types

Glove

A simple vector file that is a plain text file. Each line is a word followed by the vectors with each component (and the word) separated by a space.

This is both slow and space inefficient.

Word2Vec

A simple binary format where the first row is the number of items in the vocab and the size of the vectors. On the next line is a word followed by the vector as a binary string separated by a space.

This format is compact but slow because you need to read a byte at a time to the find the end of each word.

Dense

This is the new format. It is a binary file where the first 12 bytes are the vocab size, vector size, and max length of a word as unsigned, little endian, ints. Then the words and vectors follow with the words padded to the max length and then the vector.

This format is a little larger than the word2vec format but it is faster because the location of each item can be calculated quickly. It also allows the possibility of multithreaded reading. This format is smaller than the normal glove format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

word-vectors-1.0.0.tar.gz (4.4 kB view details)

Uploaded Source

File details

Details for the file word-vectors-1.0.0.tar.gz.

File metadata

  • Download URL: word-vectors-1.0.0.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for word-vectors-1.0.0.tar.gz
Algorithm Hash digest
SHA256 74b0c44589c9104c45779e9973495788b8c20ac5b1698bb8191dcdb4f468003a
MD5 0b8e36b192d374a31654ec4010b9a9ab
BLAKE2b-256 f0c5df30929b35360070be6109966ecae9ca7c3cdc3b3978a6bb7160635be183

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page