Inference and training for multiple languages of code2seq

These details have not been verified by PyPI

Project links

Project description

pycode2seq

Pure Python library for code2seq embeddings.

Support extension of existing pretrained code2seq embeddings to multilingual models. We provided an example of the Java model extension with Kotlin. Pretrained model and its usage example provided below.

Installation

pip install pycode2seq

Inference

File embeddings example

from pycode2seq import Code2Seq

model = Code2Seq.load("kt_java")
method_embeddings = model.methods_embeddings("File.kt")

Pretrained Java and Kotlin common model will be downloaded automatically.

Full functionality

import sys
from pycode2seq import Code2Seq

def main(argv):
    model = Code2Seq.load("kt_java")

    # Dictionary of method names with their embeddings
    method_embeddings = model.methods_embeddings("File.kt", "kt") 

    #Code2seq predictions
    predictions = model.run_on_file(argv[1], "kt")

    #Predicted method names
    names = [model.prediction_to_text(prediction) for prediction in predictions]

if __name__ == "__main__":
    main(sys.argv)

Available models

Java (java)
Kotlin (kt or kotlin)
Java & Kotlin (kt_java)

kt_java is compatible with java model and should have the same embeddings. kotlin model is a part of kt_java model, so they are compatible too.

So you can use the common kt_java model and get embeddings in one vector space for both languages.

Training

Download astminer and run:

./gradelw shadowJar

Mine projects for paths:

python training/mine_projects.py <data folder> <output folder> <path to astminer's cli.sh>

Combine mined paths:

python training/astminer_to_code2seq.py <data folder/holdout> <output folder> <holdout>

Build vocabulary with build_vocabulary.py from code2seq module

Combine vocabularies:

python training/combine_vocabularies.py

Expand weights:

python training/expand_weights.py

Using speedy-antlr-tool

You can use speedy-antlr to speed up file parsing speed.

Clone and install modified example.

Replace parser call with:

stream = antlr4.FileStream(input_file)
tree = sa_kotlin.parse(stream, "kotlinFile", sa_kotlin.SA_ErrorListener())

You still need lexer to recover token values, though.

Note, that to make Java parser you will need to follow speedy-antlr tutorial and make another package.

Using astminer to parse files

Clone astminer fork with kotlin support and run

./gradlew shadowJar

Extract methods with cli.sh arguments and usage can be found in training/mine_projects.py.

Pass path to folder with csvs to run_model_on_astminer_csv().

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.6

Aug 26, 2021

0.0.5

Aug 26, 2021

0.0.4

Jun 17, 2021

0.0.3

Jun 17, 2021

0.0.2

Jun 17, 2021

0.0.1

Jun 17, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycode2seq-0.0.6.tar.gz (164.3 kB view details)

Uploaded Aug 26, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pycode2seq-0.0.6-py3-none-any.whl (177.6 kB view details)

Uploaded Aug 26, 2021 Python 3

File details

Details for the file pycode2seq-0.0.6.tar.gz.

File metadata

Download URL: pycode2seq-0.0.6.tar.gz
Upload date: Aug 26, 2021
Size: 164.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.7.3

File hashes

Hashes for pycode2seq-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`60abc27d29fcdbd8b9abe7dc9c64e0aac6b79b59ffe401b3cf9387eeae8aa319`
MD5	`08850cae86573d37ccfc456abb763570`
BLAKE2b-256	`a4e459955dd096015b4256171b857953d686ea9ea1be7d7054e1f5e833dfda1d`

See more details on using hashes here.

File details

Details for the file pycode2seq-0.0.6-py3-none-any.whl.

File metadata

Download URL: pycode2seq-0.0.6-py3-none-any.whl
Upload date: Aug 26, 2021
Size: 177.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.7.3

File hashes

Hashes for pycode2seq-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dca1d7ee15555e2fbaf08427905ad654296a6780968126b27bd6ac890d336d8f`
MD5	`c8db2af9b225de346d20df757f1843bb`
BLAKE2b-256	`7a364b8a8d726bbdc75ed0bd069e02cfb57c5da6e801a1a6610e3c4b051c8973`

See more details on using hashes here.

pycode2seq 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

pycode2seq

Installation

Inference

File embeddings example

Full functionality

Available models

Training

Using speedy-antlr-tool

Using astminer to parse files

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes