Skip to main content

Locally generate commit messages using CodeTrans and FastT5

Project description

commit5

Automatically generate commit messages locally using T5. Currently using "SEBIS/code_trans_t5_small_commit_generation_transfer_learning_finetune", as it seems like the best quality for performance ratio (best T5-base model). Based on work of https://github.com/agemagician/CodeTrans, which uses data from https://github.com/epochx/commitgen.

Installation and Usage

Ensure you have docker installed. Then run:

pip install commit5
commit5 download # Pulls the docker image
commit5 start # Starts the docker container

Wait around 10 seconds for the image to spin up to load the model into memory. Then run

commit5 test

Then, to automatically commit:

commit5 commit

Alternatively, to only generate messages, run

commit5 generate <diff_string>

Where diff_string is the string of the diff.

Optimizations

FastT5 reduced the file sizes from 900mb (torch file) to a total of only 200mb (3 ONNX models) and the memory footprint to only 90mb. Further, the execution speed dropped from 1.3s to 0.5s.

diff --git a/grunt/tasks/release.js b/grunt/tasks/release.js
index efbb2e7..d377ee4 100644
--- a/grunt/tasks/release.js
+++ b/grunt/tasks/release.js
@@ -7,7 +7,7 @@ var grunt = require('grunt');
 
 var BOWER_PATH = '../react-bower/';
 var BOWER_GLOB = [BOWER_PATH + '*'];
-var BOWER_FILES = ['React.js', 'React.min.js', 'JSXTransformer.js'];
+var BOWER_FILES = ['react.js', 'react.min.js', 'JSXTransformer.js'];
 var GH_PAGES_PATH = '../react-gh-pages/';
 var GH_PAGES_GLOB = [GH_PAGES_PATH + '*'];

Vision

I think in general there's a huge space for locally running LM-based devtools. There's generally huge privacy concerns in corporations around productivity tools for dev which is the problem around adoptivity of tools like Copilot/Codeium/Tabnine in companies, and I think powerful LM's can be built smaller while laptops are becoming beefier. There's also an assortment of optimizations (distillation + ONNX + quantization + llama.cpp) for running LLM's on lower-power machines.

Archive

ONNX Runtime improves performance from 1.3s to 670ms per iteration, but the resulting model is bigger (1.7gb of three files vs 800mb in original). Model can be found at https://huggingface.co/kevinlu1248/ct-base-commits-onnx/. Tests were conducted on the following example.

TODO

  • Figure out how to load in 16-bit or 8-bit properly
  • Turn into a Git tool
  • Cleanup codebase
  • Upload to pypa
  • Migrate to CodeTrans-small
  • Add a grammar checker
  • Use CodeT5+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

commit5-0.1.1.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

commit5-0.1.1-py2.py3-none-any.whl (3.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page