A simple model that learns to predict Python source code
Project description
Python Autocomplete
This project try autocompleting python source code using LSTM or Transformer models.
It gives quite decent results by saving above 30% key strokes in most files, and close to 50% in some. We calculated key strokes saved by making a single (best) prediction and selecting it with a single key.
The dataset we use is the python code found in repos linked in Awesome-pytorch-list. We download all the repositories as zip files, extract them, remove non python files and split them randomly to build training and validation datasets.
We train a character level model without any tokenization of the source code, since it's the simplest.
Try it yourself
- Clone this repo
- Install requirements from
requirements.txt
- Run
python_autocomplete/create_dataset.py
.- It collects repos mentioned in PyTorch awesome list
- Downloads the zip files of the repos
- Extract the zips
- Remove non python files
- Collect all python code to
data/train.py
and,data/eval.py
- Run
python_autocomplete/train.py
to train the model. Try changing hyper-parameters like model dimensions and number of layers. - Run
evaluate.py
to evaluate the model.
You can also run the training notebook on Google Colab.
VSCode extension
- Install npm packages
cd vscode_extension
npm install
- Open the project in vscode
cd vscode_extension
code .
-
Start the server
python_autocomplete/serve.py
-
Run the extension
Run -> Start Debugging
This will open another VSCode editor window, with the extension
- Create or open a python file and start editing!
Sample
Here's a sample evaluation of a trained transformer model.
Colors:
- yellow: the token predicted is wrong and the user needs to type that character.
- blue: the token predicted is correct and the user selects it with a special key press, such as TAB or ENTER.
- green: autocompleted characters based on the prediction
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for labml_python_autocomplete-0.0.4.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | b78b4c77381a7d58e751e6119b1a4cff7c6af3430f9519b7efc535e39082076a |
|
MD5 | 2cb589512fa4b784fea09ac1fbd63d77 |
|
BLAKE2b-256 | 4171e14e7acc1276ed1089fa19541f8aad9c01e175c1a0824c169c41c093fc91 |
Hashes for labml_python_autocomplete-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 94a6fbdaadbe2077ef32082815f73bc3299f6059018efd4daca30ae0f02aa6d5 |
|
MD5 | d6576035dc21399380a77388186efaa1 |
|
BLAKE2b-256 | dd2ff504325dc539c2b0458b815762d98bd532816f61b88cbc83b3d9a5eb996f |