Skip to main content

Tool generate and visualize embeddings from bigcode

Project description

# bigcode-embeddings

NOTE: data must be generated with [bigcode-ast-tools][2] before being able to use this tool

bigcode-embeddings allows to generate and visualize embeddings for AST nodes.

## Install

This project should be used with Python 3.

To install the package either run

` pip install bigcode-embeddings `

or clone the repository and run

` cd bigcode-embeddings pip install -r requirements.txt python install `

NOTE: tensorflow needs to be installed separately.

## Usage

### Training embeddings

Training data can be generated using [bigcode-ast-tools][2]

Given a data.txt.gz generated from a vocabulary of size 30000, 100D embeddings can be trained using

` ./bin/bigcode-embeddings train -o embeddings/ --vocab-size 30000 --emb-size 100 --l2-value 0.05 --learning-rate 0.01 data.txt.gz `

[Tensorboard][2] can be used to visualize the progress

` tensorboard --logdir embeddings/ `

After the first epoch, embeddings visualization becomes available from Tensorboard. The vocabulary TSV file generated by bigcode-ast-tools can be loaded to have labels on the embeddings.

### Visualizing the embeddings

Trained embeddings can be visualized using the visualize subcommand If the generated vocabulary file is vocab.tsv, the above embeddings can be visualized with the following command

` ./bin/data-explorer visualize clusters -m embeddings/embeddings.bin-STEP -l vocab.tsv `

where STEP should be the largest value found in the embeddings/ directory.

The -i flag can be passed to generate an interactive plot.

[1]: ../bigcode-ast-tools/ [2]:

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for bigcode-embeddings, version 0.1.2
Filename, size File type Python version Upload date Hashes
Filename, size bigcode_embeddings-0.1.2-py3-none-any.whl (10.4 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size bigcode-embeddings-0.1.2.tar.gz (7.0 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page