Skip to main content

Pretrained word embeddings in Python.

Project description

# embeddings

This python package contains utilities to download and make available pretrained word embeddings.

Embeddings are stored in the `$EMBEDDINGS_ROOT` directory (defaults to `~/.embeddings`) in a SQLite 3 database for minimal load time and fast retrieval.

Instead of loading a large file to query for embeddings, `embeddings` is fast:

```python
In [1]: %timeit GloveEmbedding('common_crawl_840', d_emb=300)
100 loops, best of 3: 12.7 ms per loop

In [2]: %timeit GloveEmbedding('common_crawl_840', d_emb=300).emb('canada')
100 loops, best of 3: 12.9 ms per loop

In [3]: g = GloveEmbedding('common_crawl_840', d_emb=300)

In [4]: %timeit -n1 g.emb('canada')
1 loop, best of 3: 38.2 µs per loop
```

## Installation

```bash
pip install embeddings # from pypi
pip install git+https://github.com/vzhong/embeddings.git # from github
```


## Usage

Note that on first usage, the embeddings will be downloaded. This may take a long time for large embeddings such as GloVe.

```python
from embeddings import GloveEmbedding, FastTextEmbedding, KazumaCharEmbedding

g = GloveEmbedding('common_crawl_840', d_emb=300, show_progress=True)
f = FastTextEmbedding()
k = KazumaCharEmbedding()
for w in ['canada', 'vancouver', 'toronto']:
print('embedding {}'.format(w))
print(g.emb(w))
print(f.emb(w))
print(k.emb(w))
```

## Contribution

Pull requests welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embeddings-0.0.3.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

embeddings-0.0.3-py3.5.egg (20.5 kB view details)

Uploaded Source

File details

Details for the file embeddings-0.0.3.tar.gz.

File metadata

  • Download URL: embeddings-0.0.3.tar.gz
  • Upload date:
  • Size: 6.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for embeddings-0.0.3.tar.gz
Algorithm Hash digest
SHA256 f213e01886afe8a1442e0012f03db7efb9dfcb8bb40d319beb1e95e80e14be42
MD5 601916be116f09ca9cad9e3b3cc8c9a3
BLAKE2b-256 0f490a8315eef4fb95b25f3622d8e82d441c8e8fb6261fa5a825068b787eea8d

See more details on using hashes here.

File details

Details for the file embeddings-0.0.3-py3.5.egg.

File metadata

  • Download URL: embeddings-0.0.3-py3.5.egg
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.5

File hashes

Hashes for embeddings-0.0.3-py3.5.egg
Algorithm Hash digest
SHA256 8562b87b3918d041711fc3cc25a2b0224d75f23f5851c79eda8949f20b2ebe46
MD5 0f7cab1050cd02fcf7348edefd04c554
BLAKE2b-256 cd0c55cf315cadfb0c5aaee98cd6f8ada4d9522f8b2219fb1727a7afe5fbfd11

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page