Code AutoComplete

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Code AutoComplete

code-autocomplete, a code completion plugin for Python.

code-autocomplete can Automatic completion of code line granularity and block granularity.

Guide

Feature
Install
Usage
Contact
Citation
Reference

Feature

GPT2-based code completion
Code completion for Python, other language is coming soon
Line and block code completion
Train(Fine-tune GPT2) and predict model with your own data

Install

pip3 install -U code-autocomplete

git clone https://github.com/shibing624/code-autocomplete.git
cd code-autocomplete
python3 setup.py install

Usage

Code Completion

Model upload to HF's model hub: shibing624/code-autocomplete-gpt2-base

Use with code-autocomplete

example: base_demo.py

from autocomplete.gpt2_coder import GPT2Coder

m = GPT2Coder("shibing624/code-autocomplete-gpt2-base")
print(m.generate('import torch.nn as')[0])

output:

import torch.nn as nn

Use with huggingface/transformers：

example: use_transformers_gpt2.py

Please use 'GPT2' related functions to load this model!

import os
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = GPT2Tokenizer.from_pretrained("shibing624/code-autocomplete-gpt2-base")
model = GPT2LMHeadModel.from_pretrained("shibing624/code-autocomplete-gpt2-base")
model.to(device)
prompts = [
    """from torch import nn
    class LSTM(Module):
        def __init__(self, *,
                     n_tokens: int,
                     embedding_size: int,
                     hidden_size: int,
                     n_layers: int):""",
    """import numpy as np
    import torch
    import torch.nn as""",
    "import java.util.ArrayList;",
    "def factorial(n):",
]
for prompt in prompts:
    input_ids = tokenizer(prompt, return_tensors='pt').to(device).input_ids
    outputs = model.generate(input_ids=input_ids,
                             max_length=64 + len(input_ids[0]),
                             temperature=1.0,
                             top_k=50,
                             top_p=0.95,
                             repetition_penalty=1.0,
                             do_sample=True,
                             num_return_sequences=1,
                             length_penalty=2.0,
                             early_stopping=True,
                             pad_token_id=tokenizer.eos_token_id,
                             eos_token_id=tokenizer.eos_token_id,
                             )
    decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print("Input :", prompt)
    print("Output:", decoded)
    print("=" * 20)

output:

from torch import nn
class LSTM(Module):
    def __init__(self, *,
                 n_tokens: int,
                 embedding_size: int,
                 hidden_size: int,
                 n_layers: int):
        self.hidden_size = hidden_size
        self.embedding_size = embedding_size

====================

import numpy as np
import torch
import torch.nn as nn

====================
...

Train your own model with Dataset

Build dataset from scratch

This allows to customize dataset building. Below is an example of the building process.

Let's use Python codes from Awesome-pytorch-list and TheAlgorithms/Python as the dataset.

We want the model to help auto-complete codes at a general level. The codes of The Algorithms suits the need.
This code from this project is well written (high-quality codes).

Auto download source code and build dataset：

prepare_data.py

cd examples
python prepare_data.py

Train and predict model

example: train_gpt2.py

cd examples
python train_gpt2.py --do_train --do_preidct --num_epochs 15 --model_dir outputs-fine-tuned --model_name gpt2

Server

start FastAPI server:

example: server.py

cd examples
python server.py

open url: http://0.0.0.0:8001/docs

api

Contact

Issue(建议)：
邮件我：xuming: xuming624@qq.com
微信我：加我微信号：xuming624, 备注：个人名称-公司-NLP 进NLP交流群。

Citation

如果你在研究中使用了code-autocomplete，请按如下格式引用：

@misc{code-autocomplete,
  author = {Xu Ming},
  title = {code-autocomplete: Code AutoComplete with GPT2 model},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/shibing624/code-autocomplete},
}

License

授权协议为 The Apache License 2.0，可免费用做商业用途。请在产品说明中附加code-autocomplete的链接和授权协议。

Contribute

项目代码还很粗糙，如果大家对代码有所改进，欢迎提交回本项目，在提交之前，注意以下两点：

在tests添加相应的单元测试
使用python setup.py test来运行所有单元测试，确保所有单测都是通过的

之后即可提交PR。

Reference

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.4

Mar 1, 2022

0.0.3

Feb 15, 2022

This version

0.0.2

Feb 13, 2022

0.0.1

Feb 11, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code-autocomplete-0.0.2.tar.gz (18.8 kB view hashes)

Uploaded Feb 13, 2022 Source

Hashes for code-autocomplete-0.0.2.tar.gz

Hashes for code-autocomplete-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`0762b0abb13ff9d18d8f76a50df9caba8f22899a1944aabb24ba057cbe5bee99`
MD5	`e001ac7124f61849a00bbe0ede1f2146`
BLAKE2b-256	`f13f8daa04030a0e2a923ae7f042c1c69a67398aa1873791bd2721699a26a4a2`