Skip to main content

Tool generate and visualize embeddings from bigcode

Project description

# bigcode-embeddings

NOTE: data must be generated with [bigcode-ast-tools][2] before being able to use this tool

bigcode-embeddings allows to generate and visualize embeddings for AST nodes.

## Install

This project should be used with Python 3.

To install the package either run

` pip install bigcode-embeddings `

or clone the repository and run

` cd bigcode-embeddings pip install -r requirements.txt python setup.py install `

NOTE: tensorflow needs to be installed separately.

## Usage

### Training embeddings

Training data can be generated using [bigcode-ast-tools][2]

Given a data.txt.gz generated from a vocabulary of size 30000, 100D embeddings can be trained using

` ./bin/bigcode-embeddings train -o embeddings/ --vocab-size 30000 --emb-size 100 --l2-value 0.05 --learning-rate 0.01 data.txt.gz `

[Tensorboard][2] can be used to visualize the progress

` tensorboard --logdir embeddings/ `

After the first epoch, embeddings visualization becomes available from Tensorboard. The vocabulary TSV file generated by bigcode-ast-tools can be loaded to have labels on the embeddings.

### Visualizing the embeddings

Trained embeddings can be visualized using the visualize subcommand If the generated vocabulary file is vocab.tsv, the above embeddings can be visualized with the following command

` ./bin/data-explorer visualize clusters -m embeddings/embeddings.bin-STEP -l vocab.tsv `

where STEP should be the largest value found in the embeddings/ directory.

The -i flag can be passed to generate an interactive plot.

[1]: ../bigcode-ast-tools/README.md [2]: https://github.com/tensorflow/tensorboard

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigcode-embeddings-0.1.2.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

bigcode_embeddings-0.1.2-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file bigcode-embeddings-0.1.2.tar.gz.

File metadata

File hashes

Hashes for bigcode-embeddings-0.1.2.tar.gz
Algorithm Hash digest
SHA256 889e98d29ccbe4459337a8dcb9830918aa1bc1bfbe17d4e2ded5f48e5e0b0530
MD5 17f064180c415d925d0be60d8d27eda4
BLAKE2b-256 f3060e4ac9ab46ed577f3f5499fb4264b3e73bf6d59fe7dd343baba0215aa03f

See more details on using hashes here.

File details

Details for the file bigcode_embeddings-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for bigcode_embeddings-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 46eb6fa689cb5f98f70bbab06f21440e743735d2d3eef4c02521c4670473fa22
MD5 cab8721ab461cf820e6c8387ee77b23f
BLAKE2b-256 17fcbd4bee5397bfb57eb847b82edf7691ae5af49deba15c2d94ff64203c3f57

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page