Locally generate commit messages using CodeTrans and FastT5
Project description
commit5
Automatically generate commit messages locally using T5. Currently using "SEBIS/code_trans_t5_small_commit_generation_transfer_learning_finetune", as it seems like the best quality for performance ratio (best T5-base model). Based on work of https://github.com/agemagician/CodeTrans, which uses data from https://github.com/epochx/commitgen.
Installation and Usage
Ensure you have docker installed. Then run:
pip install commit5
commit5 download # Pulls the docker image
commit5 start # Starts the docker container
Wait around 10 seconds for the image to spin up to load the model into memory. Then run
commit5 test
Then, to automatically commit:
commit5 commit
Alternatively, to only generate messages, run
commit5 generate <diff_string>
Where diff_string is the string of the diff.
Optimizations
FastT5 reduced the file sizes from 900mb (torch file) to a total of only 200mb (3 ONNX models) and the memory footprint to only 90mb. Further, the execution speed dropped from 1.3s to 0.5s.
diff --git a/grunt/tasks/release.js b/grunt/tasks/release.js
index efbb2e7..d377ee4 100644
--- a/grunt/tasks/release.js
+++ b/grunt/tasks/release.js
@@ -7,7 +7,7 @@ var grunt = require('grunt');
var BOWER_PATH = '../react-bower/';
var BOWER_GLOB = [BOWER_PATH + '*'];
-var BOWER_FILES = ['React.js', 'React.min.js', 'JSXTransformer.js'];
+var BOWER_FILES = ['react.js', 'react.min.js', 'JSXTransformer.js'];
var GH_PAGES_PATH = '../react-gh-pages/';
var GH_PAGES_GLOB = [GH_PAGES_PATH + '*'];
Vision
I think in general there's a huge space for locally running LM-based devtools. There's generally huge privacy concerns in corporations around productivity tools for dev which is the problem around adoptivity of tools like Copilot/Codeium/Tabnine in companies, and I think powerful LM's can be built smaller while laptops are becoming beefier. There's also an assortment of optimizations (distillation + ONNX + quantization + llama.cpp) for running LLM's on lower-power machines.
Archive
ONNX Runtime improves performance from 1.3s to 670ms per iteration, but the resulting model is bigger (1.7gb of three files vs 800mb in original). Model can be found at https://huggingface.co/kevinlu1248/ct-base-commits-onnx/. Tests were conducted on the following example.
TODO
- Figure out how to load in 16-bit or 8-bit properly
- Turn into a Git tool
- Cleanup codebase
- Upload to pypa
- Migrate to CodeTrans-small
- Add a grammar checker
- Use CodeT5+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file commit5-0.1.1.tar.gz
.
File metadata
- Download URL: commit5-0.1.1.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2acc417cc4dc337cc35bf99d2beb008899d835328f6e4f8458b96087075bd97e |
|
MD5 | 1b93f93369286723b22b93a604015ef0 |
|
BLAKE2b-256 | 1d9da64cbf671884b68ccd1ba6ad6d9ba29204bd7b4b85c6623c742905932c56 |
File details
Details for the file commit5-0.1.1-py2.py3-none-any.whl
.
File metadata
- Download URL: commit5-0.1.1-py2.py3-none-any.whl
- Upload date:
- Size: 3.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f12efa8ce0598b92597c7e0d76bdea1a7d5be38cd747544f2e3fedbf96ec6bf |
|
MD5 | b741e2af2657cada8e2d25c06f2635a1 |
|
BLAKE2b-256 | a21e9434adbb99b80166960b5adbeaf3cf5281561e990c7a979f24297fcb41f4 |