Locally generate commit messages using CodeTrans and FastT5

Project description

commit5

Automatically generate commit messages locally using T5. Currently using "SEBIS/code_trans_t5_small_commit_generation_transfer_learning_finetune", as it seems like the best quality for performance ratio (best T5-base model). Based on work of https://github.com/agemagician/CodeTrans, which uses data from https://github.com/epochx/commitgen.

Installation and Usage

Ensure you have docker installed. Then run:

pip install commit5
commit5 download # Pulls the docker image
commit5 start # Starts the docker container

Wait around 10 seconds for the image to spin up to load the model into memory. Then run

commit5 test

Then, to automatically commit:

commit5 commit

Alternatively, to only generate messages, run

commit5 generate <diff_string>

Where diff_string is the string of the diff.

Optimizations

FastT5 reduced the file sizes from 900mb (torch file) to a total of only 200mb (3 ONNX models) and the memory footprint to only 90mb. Further, the execution speed dropped from 1.3s to 0.5s.

diff --git a/grunt/tasks/release.js b/grunt/tasks/release.js
index efbb2e7..d377ee4 100644
--- a/grunt/tasks/release.js
+++ b/grunt/tasks/release.js
@@ -7,7 +7,7 @@ var grunt = require('grunt');
 
 var BOWER_PATH = '../react-bower/';
 var BOWER_GLOB = [BOWER_PATH + '*'];
-var BOWER_FILES = ['React.js', 'React.min.js', 'JSXTransformer.js'];
+var BOWER_FILES = ['react.js', 'react.min.js', 'JSXTransformer.js'];
 var GH_PAGES_PATH = '../react-gh-pages/';
 var GH_PAGES_GLOB = [GH_PAGES_PATH + '*'];

Vision

I think in general there's a huge space for locally running LM-based devtools. There's generally huge privacy concerns in corporations around productivity tools for dev which is the problem around adoptivity of tools like Copilot/Codeium/Tabnine in companies, and I think powerful LM's can be built smaller while laptops are becoming beefier. There's also an assortment of optimizations (distillation + ONNX + quantization + llama.cpp) for running LLM's on lower-power machines.

TODO

Figure out how to load in 16-bit or 8-bit properly
Turn into a Git tool
Cleanup codebase
Upload to pypa
Migrate to CodeTrans-small
Add a grammar checker
Use CodeT5+

Project details

Release history Release notifications | RSS feed

This version

0.1.1

May 17, 2023

0.1.0

May 17, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

commit5-0.1.1.tar.gz (5.1 kB view hashes)

Uploaded May 17, 2023 Source

Built Distribution

commit5-0.1.1-py2.py3-none-any.whl (3.1 kB view hashes)

Uploaded May 17, 2023 Python 2 Python 3

Hashes for commit5-0.1.1.tar.gz

Hashes for commit5-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`2acc417cc4dc337cc35bf99d2beb008899d835328f6e4f8458b96087075bd97e`
MD5	`1b93f93369286723b22b93a604015ef0`
BLAKE2b-256	`1d9da64cbf671884b68ccd1ba6ad6d9ba29204bd7b4b85c6623c742905932c56`

Hashes for commit5-0.1.1-py2.py3-none-any.whl

Hashes for commit5-0.1.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f12efa8ce0598b92597c7e0d76bdea1a7d5be38cd747544f2e3fedbf96ec6bf`
MD5	`b741e2af2657cada8e2d25c06f2635a1`
BLAKE2b-256	`a21e9434adbb99b80166960b5adbeaf3cf5281561e990c7a979f24297fcb41f4`