Skip to main content

Tool generate and visualize embeddings from bigcode

Project description

# bigcode-embeddings

NOTE: data must be generated with [bigcode-ast-tools][2] before being able to use this tool

bigcode-embeddings allows to generate and visualize embeddings for AST nodes.

## Install

This project should be used with Python 3.

To install the package either run

` pip install bigcode-embeddings `

or clone the repository and run

` cd bigcode-embeddings pip install -r requirements.txt python setup.py install `

NOTE: tensorflow needs to be installed separately.

## Usage

### Training embeddings

Training data can be generated using [bigcode-ast-tools][2]

Given a data.txt.gz generated from a vocabulary of size 30000, 100D embeddings can be trained using

` ./bin/bigcode-embeddings train -i data.txt.gz -o embeddings/ --vocab-size 30000 --emb-size 100 --l2-value 0.05 --learning-rate 0.01 `

[Tensorboard][2] can be used to visualize the progress

` tensorboard --logdir embeddings/ `

After the first epoch, embeddings visualization becomes available from Tensorboard. The vocabulary TSV file generated by bigcode-ast-tools can be loaded to have labels on the embeddings.

### Visualizing the embeddings

Trained embeddings can be visualized using the visualize subcommand If the generated vocabulary file is vocab.tsv, the above embeddings can be visualized with the following command

` ./bin/data-explorer visualize clusters -m embeddings/w2v.bin-STEP -l vocab.tsv `

where STEP should be the largest value found in the embeddings/ directory.

The -i flag can be passed to generate an interactive plot.

[1]: ../bigcode-ast-tools/README.md [2]: https://github.com/tensorflow/tensorboard

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigcode-embeddings-0.1.1.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

bigcode_embeddings-0.1.1-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file bigcode-embeddings-0.1.1.tar.gz.

File metadata

File hashes

Hashes for bigcode-embeddings-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4c9f2f57d602ed1690d478f5e201598805870523f967555cdc017d8158d5f557
MD5 2845f4079994f76e59ca92e38d58ad96
BLAKE2b-256 0878bb90872a19baacc6d74eb0aece593b82ac01a47f54230dad987877758917

See more details on using hashes here.

File details

Details for the file bigcode_embeddings-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bigcode_embeddings-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a7c1e37e076b073d124ade339e09ff91a4485df394eb4798ad5e9ddd7b8d03bd
MD5 29f0c06c9610b309af72cf3490bf2780
BLAKE2b-256 f613423fef2babb9d07dbac28933157516257c1c2064241a5d1ffcda13ba5f8e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page