Python package for computing embeddings from co-occurence matrices
Project description
# Glove
Cython general implementation of the Glove multi-threaded training.
GloVe is an unsupervised learning algorithm for generating vector representations for words.
Training is done using a co-occcurence matrix from a corpus. The resulting representations contain structure useful for many other tasks.
The paper describing the model is [here](http://nlp.stanford.edu/projects/glove/glove.pdf).
The original implementation for this Machine Learning model can be [found here](http://nlp.stanford.edu/projects/glove/).
@author Jonathan Raiman
## Example
To use this package you need a sparse co-occurence matrix.
This matrix is represented by nested dictionaries that use ints as keys
with a 0-index.
For instance below we have a corpus of 3 indices. Below 0 co-occurs with 2, 3.5 times:
```python
import glove
cooccur = {
0: {
0: 1.0,
2: 3.5
},
1: {
2: 0.5
},
2: {
0: 3.5,
1: 0.5,
2: 1.2
}
}
model = glove.Glove(cooccur, vocab_size=3, d=50, alpha=0.75, x_max=100.0)
for epoch in range(25):
err = model.train(batch_size=200, workers=9, batch_size=50)
print("epoch %d, error %.3f" % (epoch, err), flush=True)
```
The trained embeddings are now present under `model.W`.
## Usage
The model is controlled by setting several hyperpameters.
### Glove.__init__()
* `cooccurence` dict<int, dict<int, float>> : the co-occurence matrix
* `alpha` float : (default 0.75) hyperparameter for controlling the exponent for normalized co-occurence counts.
* `x_max` float : (default 100.0) hyperparameter for controlling smoothing for common items in co-occurence matrix.
* `d` int : (default 50) how many embedding dimensions for learnt vectors
* `seed` int : (default 1234) the random seed
### Glove.train
* `step_size` float : the learning rate for the model
* `workers` int : number of worker threads used for training
* `batch_size` int : how many examples should each thread receive (controls the size of the job queue)
Cython general implementation of the Glove multi-threaded training.
GloVe is an unsupervised learning algorithm for generating vector representations for words.
Training is done using a co-occcurence matrix from a corpus. The resulting representations contain structure useful for many other tasks.
The paper describing the model is [here](http://nlp.stanford.edu/projects/glove/glove.pdf).
The original implementation for this Machine Learning model can be [found here](http://nlp.stanford.edu/projects/glove/).
@author Jonathan Raiman
## Example
To use this package you need a sparse co-occurence matrix.
This matrix is represented by nested dictionaries that use ints as keys
with a 0-index.
For instance below we have a corpus of 3 indices. Below 0 co-occurs with 2, 3.5 times:
```python
import glove
cooccur = {
0: {
0: 1.0,
2: 3.5
},
1: {
2: 0.5
},
2: {
0: 3.5,
1: 0.5,
2: 1.2
}
}
model = glove.Glove(cooccur, vocab_size=3, d=50, alpha=0.75, x_max=100.0)
for epoch in range(25):
err = model.train(batch_size=200, workers=9, batch_size=50)
print("epoch %d, error %.3f" % (epoch, err), flush=True)
```
The trained embeddings are now present under `model.W`.
## Usage
The model is controlled by setting several hyperpameters.
### Glove.__init__()
* `cooccurence` dict<int, dict<int, float>> : the co-occurence matrix
* `alpha` float : (default 0.75) hyperparameter for controlling the exponent for normalized co-occurence counts.
* `x_max` float : (default 100.0) hyperparameter for controlling smoothing for common items in co-occurence matrix.
* `d` int : (default 50) how many embedding dimensions for learnt vectors
* `seed` int : (default 1234) the random seed
### Glove.train
* `step_size` float : the learning rate for the model
* `workers` int : number of worker threads used for training
* `batch_size` int : how many examples should each thread receive (controls the size of the job queue)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
glove-1.0.2.tar.gz
(44.9 kB
view details)
File details
Details for the file glove-1.0.2.tar.gz
.
File metadata
- Download URL: glove-1.0.2.tar.gz
- Upload date:
- Size: 44.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2e00cdcc3fa77a72f4e6ab89f73236da34feb0e38908b5aea8110cdb3b747c6 |
|
MD5 | 7890930e2d401b63f0079ee812979ea2 |
|
BLAKE2b-256 | 8ac917c400d0c29746162bd47fc719bf3212b2b031949d41d712e9bdef11ae03 |