Tool generate and visualize embeddings from bigcode
Project description
# bigcode-embeddings
NOTE: data must be generated with [bigcode-ast-tools][2] before being able to use this tool
bigcode-embeddings allows to generate and visualize embeddings for AST nodes.
## Install
This project should be used with Python 3.
To install the package either run
` pip install bigcode-embeddings `
or clone the repository and run
` cd bigcode-embeddings pip install -r requirements.txt python setup.py install `
NOTE: tensorflow needs to be installed separately.
## Usage
### Training embeddings
Training data can be generated using [bigcode-ast-tools][2]
Given a data.txt.gz generated from a vocabulary of size 30000, 100D embeddings can be trained using
` ./bin/bigcode-embeddings train -o embeddings/ --vocab-size 30000 --emb-size 100 --l2-value 0.05 --learning-rate 0.01 data.txt.gz `
[Tensorboard][2] can be used to visualize the progress
` tensorboard --logdir embeddings/ `
After the first epoch, embeddings visualization becomes available from Tensorboard. The vocabulary TSV file generated by bigcode-ast-tools can be loaded to have labels on the embeddings.
### Visualizing the embeddings
Trained embeddings can be visualized using the visualize subcommand If the generated vocabulary file is vocab.tsv, the above embeddings can be visualized with the following command
` ./bin/data-explorer visualize clusters -m embeddings/embeddings.bin-STEP -l vocab.tsv `
where STEP should be the largest value found in the embeddings/ directory.
The -i flag can be passed to generate an interactive plot.
[1]: ../bigcode-ast-tools/README.md [2]: https://github.com/tensorflow/tensorboard
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bigcode-embeddings-0.1.2.tar.gz
.
File metadata
- Download URL: bigcode-embeddings-0.1.2.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 889e98d29ccbe4459337a8dcb9830918aa1bc1bfbe17d4e2ded5f48e5e0b0530 |
|
MD5 | 17f064180c415d925d0be60d8d27eda4 |
|
BLAKE2b-256 | f3060e4ac9ab46ed577f3f5499fb4264b3e73bf6d59fe7dd343baba0215aa03f |
File details
Details for the file bigcode_embeddings-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: bigcode_embeddings-0.1.2-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46eb6fa689cb5f98f70bbab06f21440e743735d2d3eef4c02521c4670473fa22 |
|
MD5 | cab8721ab461cf820e6c8387ee77b23f |
|
BLAKE2b-256 | 17fcbd4bee5397bfb57eb847b82edf7691ae5af49deba15c2d94ff64203c3f57 |