Code AutoComplete
Project description
Code AutoComplete
code-autocomplete, a code completion plugin for Python.
code-autocomplete can Automatic completion of code line granularity and block granularity.
Guide
Feature
- GPT2-based code completion
- Code completion for Python, other language is coming soon
- Line and block code completion
- Train(Fine-tune GPT2) and predict model with your own data
Install
pip3 install -U code-autocomplete
or
git clone https://github.com/shibing624/code-autocomplete.git
cd code-autocomplete
python3 setup.py install
Usage
Code Completion
Model upload to HF's model hub: shibing624/code-autocomplete-gpt2-base
Use with code-autocomplete
example: base_demo.py
from autocomplete.gpt2_coder import GPT2Coder
m = GPT2Coder("shibing624/code-autocomplete-gpt2-base")
print(m.generate('import torch.nn as')[0])
output:
import torch.nn as nn
Use with huggingface/transformers:
example: use_transformers_gpt2.py
Please use 'GPT2' related functions to load this model!
import os
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = GPT2Tokenizer.from_pretrained("shibing624/code-autocomplete-gpt2-base")
model = GPT2LMHeadModel.from_pretrained("shibing624/code-autocomplete-gpt2-base")
model.to(device)
prompts = [
"""from torch import nn
class LSTM(Module):
def __init__(self, *,
n_tokens: int,
embedding_size: int,
hidden_size: int,
n_layers: int):""",
"""import numpy as np
import torch
import torch.nn as""",
"import java.util.ArrayList;",
"def factorial(n):",
]
for prompt in prompts:
input_ids = tokenizer(prompt, return_tensors='pt').to(device).input_ids
outputs = model.generate(input_ids=input_ids,
max_length=64 + len(input_ids[0]),
temperature=1.0,
top_k=50,
top_p=0.95,
repetition_penalty=1.0,
do_sample=True,
num_return_sequences=1,
length_penalty=2.0,
early_stopping=True,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
)
decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("Input :", prompt)
print("Output:", decoded)
print("=" * 20)
output:
from torch import nn
class LSTM(Module):
def __init__(self, *,
n_tokens: int,
embedding_size: int,
hidden_size: int,
n_layers: int):
self.hidden_size = hidden_size
self.embedding_size = embedding_size
====================
import numpy as np
import torch
import torch.nn as nn
====================
...
Train your own model with Dataset
Build dataset from scratch
This allows to customize dataset building. Below is an example of the building process.
Let's use Python codes from Awesome-pytorch-list and TheAlgorithms/Python as the dataset.
- We want the model to help auto-complete codes at a general level. The codes of The Algorithms suits the need.
- This code from this project is well written (high-quality codes).
Auto download source code and build dataset:
cd examples
python prepare_data.py
Train and predict model
example: train_gpt2.py
cd examples
python train_gpt2.py --do_train --do_preidct --num_epochs 15 --model_dir outputs-fine-tuned --model_name gpt2
Server
start FastAPI server:
example: server.py
cd examples
python server.py
open url: http://0.0.0.0:8001/docs
Contact
- Issue(建议):
- 邮件我:xuming: xuming624@qq.com
- 微信我: 加我微信号:xuming624, 备注:个人名称-公司-NLP 进NLP交流群。
Citation
如果你在研究中使用了code-autocomplete,请按如下格式引用:
@misc{code-autocomplete,
author = {Xu Ming},
title = {code-autocomplete: Code AutoComplete with GPT2 model},
year = {2022},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/shibing624/code-autocomplete},
}
License
授权协议为 The Apache License 2.0,可免费用做商业用途。请在产品说明中附加code-autocomplete的链接和授权协议。
Contribute
项目代码还很粗糙,如果大家对代码有所改进,欢迎提交回本项目,在提交之前,注意以下两点:
- 在
tests
添加相应的单元测试 - 使用
python setup.py test
来运行所有单元测试,确保所有单测都是通过的
之后即可提交PR。
Reference
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.