Skip to main content

Tool generate and visualize embeddings from bigcode

Project description

# bigcode-embeddings

NOTE: data must be generated with [bigcode-ast-tools][2] before being able to use this tool

bigcode-embeddings allows to generate and visualize embeddings for AST nodes.

## Install

This project should be used with Python 3.

To install the package either run

` pip install bigcode-embeddings `

or clone the repository and run

` cd bigcode-embeddings pip install -r requirements.txt python setup.py install `

NOTE: tensorflow needs to be installed separately.

## Usage

### Training embeddings

Training data can be generated using [bigcode-ast-tools][2]

Given a data.txt.gz generated from a vocabulary of size 30000, 100D embeddings can be trained using

` ./bin/bigcode-embeddings train -i data.txt.gz -o embeddings/ --vocab-size 30000 --emb-size 100 --l2-value 0.05 --learning-rate 0.01 `

[Tensorboard][2] can be used to visualize the progress

` tensorboard --logdir embeddings/ `

After the first epoch, embeddings visualization becomes available from Tensorboard. The vocabulary TSV file generated by bigcode-ast-tools can be loaded to have labels on the embeddings.

### Visualizing the embeddings

Trained embeddings can be visualized using the visualize subcommand If the generated vocabulary file is vocab.tsv, the above embeddings can be visualized with the following command

` ./bin/data-explorer visualize clusters -m embeddings/w2v.bin-STEP -l vocab.tsv `

where STEP should be the largest value found in the embeddings/ directory.

The -i flag can be passed to generate an interactive plot.

[1]: ../bigcode-ast-tools/README.md [2]: https://github.com/tensorflow/tensorboard

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigcode-embeddings-0.1.0.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

bigcode_embeddings-0.1.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file bigcode-embeddings-0.1.0.tar.gz.

File metadata

File hashes

Hashes for bigcode-embeddings-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c3e6bb0478c73081ac7972f7e780a8d7f2eef91c2b25c39df5e7e8e652be8a93
MD5 2c7301be498905f21982fc1371661618
BLAKE2b-256 1163447274aacc7123fd4e774d82f5b75c80867002c3c5f6834929e8fb2a3f53

See more details on using hashes here.

File details

Details for the file bigcode_embeddings-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bigcode_embeddings-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 314930755b6feac64e532d502baad4165c9ff84b711b1e711c9e36efe6ecc618
MD5 b4c6c8edba051170d11acffbbf1b9413
BLAKE2b-256 d1c80fa29826e551fa5b737af47ce0187abe9794430a820fea955020fafea1e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page