Skip to main content

Pretrained word embeddings in Python.

Project description

# embeddings

This python package contains utilities to download and make available pretrained word embeddings.

Embeddings are stored in the `$EMBEDDINGS_ROOT` directory (defaults to `~/.embeddings`) in a SQLite 3 database for minimal load time and fast retrieval.

Instead of loading a large file to query for embeddings, `embeddings` is fast:

```python
In [1]: %timeit GloveEmbedding('common_crawl_840', d_emb=300)
100 loops, best of 3: 12.7 ms per loop

In [2]: %timeit GloveEmbedding('common_crawl_840', d_emb=300).emb('canada')
100 loops, best of 3: 12.9 ms per loop

In [3]: g = GloveEmbedding('common_crawl_840', d_emb=300)

In [4]: %timeit -n1 g.emb('canada')
1 loop, best of 3: 38.2 µs per loop
```

## Installation

```bash
pip install embeddings # from pypi
pip install git+https://github.com/vzhong/embeddings.git # from github
```


## Usage

Note that on first usage, the embeddings will be downloaded. This may take a long time for large embeddings such as GloVe.

```python
from embeddings import GloveEmbedding, FastTextEmbedding, KazumaCharEmbedding

g = GloveEmbedding('common_crawl_840', d_emb=300, show_progress=True)
f = FastTextEmbedding()
k = KazumaCharEmbedding()
for w in ['canada', 'vancouver', 'toronto']:
print('embedding {}'.format(w))
print(g.emb(w))
print(f.emb(w))
print(k.emb(w))
```

## Contribution

Pull requests welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embeddings-0.0.2.tar.gz (6.8 kB view details)

Uploaded Source

File details

Details for the file embeddings-0.0.2.tar.gz.

File metadata

  • Download URL: embeddings-0.0.2.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for embeddings-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3f206103e7cab4791f68fe4a519d7763eaff8671da2cd6b4a17fb6b08089cdc1
MD5 77993029367f7e5c0e0e7425cf3c659a
BLAKE2b-256 ef67751c22d7cfc7010a58a2abb7a64309585c39d0e8467be76c90711f24d3ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page