Skip to main content

Tool generate and visualize embeddings from bigcode

Project description

# bigcode-embeddings

NOTE: data must be generated with [bigcode-ast-tools][2] before being able to use this tool

bigcode-embeddings allows to generate and visualize embeddings for AST nodes.

## Install

This project should be used with Python 3.

To install the package either run

` pip install bigcode-embeddings `

or clone the repository and run

` cd bigcode-embeddings pip install -r requirements.txt python setup.py install `

NOTE: tensorflow needs to be installed separately.

## Usage

### Training embeddings

Training data can be generated using [bigcode-ast-tools][2]

Given a data.txt.gz generated from a vocabulary of size 30000, 100D embeddings can be trained using

` ./bin/bigcode-embeddings train -o embeddings/ --vocab-size 30000 --emb-size 100 --l2-value 0.05 --learning-rate 0.01 data.txt.gz `

[Tensorboard][2] can be used to visualize the progress

` tensorboard --logdir embeddings/ `

After the first epoch, embeddings visualization becomes available from Tensorboard. The vocabulary TSV file generated by bigcode-ast-tools can be loaded to have labels on the embeddings.

### Visualizing the embeddings

Trained embeddings can be visualized using the visualize subcommand If the generated vocabulary file is vocab.tsv, the above embeddings can be visualized with the following command

` ./bin/data-explorer visualize clusters -m embeddings/embeddings.bin-STEP -l vocab.tsv `

where STEP should be the largest value found in the embeddings/ directory.

The -i flag can be passed to generate an interactive plot.

[1]: ../bigcode-ast-tools/README.md [2]: https://github.com/tensorflow/tensorboard

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
bigcode_embeddings-0.1.2-py3-none-any.whl (10.4 kB) Copy SHA256 hash SHA256 Wheel py3 Oct 14, 2017
bigcode-embeddings-0.1.2.tar.gz (7.0 kB) Copy SHA256 hash SHA256 Source None Oct 14, 2017

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page