Tool generate and visualize embeddings from bigcode
Project description
# bigcode-embeddings
NOTE: data must be generated with [bigcode-ast-tools][2] before being able to use this tool
bigcode-embeddings allows to generate and visualize embeddings for AST nodes.
## Install
This project should be used with Python 3.
To install the package either run
` pip install bigcode-embeddings `
or clone the repository and run
` cd bigcode-embeddings pip install -r requirements.txt python setup.py install `
NOTE: tensorflow needs to be installed separately.
## Usage
### Training embeddings
Training data can be generated using [bigcode-ast-tools][2]
Given a data.txt.gz generated from a vocabulary of size 30000, 100D embeddings can be trained using
` ./bin/bigcode-embeddings train -o embeddings/ --vocab-size 30000 --emb-size 100 --l2-value 0.05 --learning-rate 0.01 data.txt.gz `
[Tensorboard][2] can be used to visualize the progress
` tensorboard --logdir embeddings/ `
After the first epoch, embeddings visualization becomes available from Tensorboard. The vocabulary TSV file generated by bigcode-ast-tools can be loaded to have labels on the embeddings.
### Visualizing the embeddings
Trained embeddings can be visualized using the visualize subcommand If the generated vocabulary file is vocab.tsv, the above embeddings can be visualized with the following command
` ./bin/data-explorer visualize clusters -m embeddings/embeddings.bin-STEP -l vocab.tsv `
where STEP should be the largest value found in the embeddings/ directory.
The -i flag can be passed to generate an interactive plot.
[1]: ../bigcode-ast-tools/README.md [2]: https://github.com/tensorflow/tensorboard
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bigcode_embeddings-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46eb6fa689cb5f98f70bbab06f21440e743735d2d3eef4c02521c4670473fa22 |
|
MD5 | cab8721ab461cf820e6c8387ee77b23f |
|
BLAKE2b-256 | 17fcbd4bee5397bfb57eb847b82edf7691ae5af49deba15c2d94ff64203c3f57 |