Skip to main content

Locally generate commit messages using CodeTrans and FastT5

Project description

commit5

Automatically generate commit messages locally using T5. Currently using "SEBIS/code_trans_t5_small_commit_generation_transfer_learning_finetune", as it seems like the best quality for performance ratio (best T5-base model). Based on work of https://github.com/agemagician/CodeTrans, which uses data from https://github.com/epochx/commitgen.

Installation and Usage

Ensure you have docker installed. Then run:

pip install commit5
commit5 download # Pulls the docker image
commit5 start # Starts the docker container

Wait around 10 seconds for the image to spin up to load the model into memory. Then run

commit5 test

Then, to automatically commit:

commit5 commit

Alternatively, to only generate messages, run

commit5 generate <diff_string>

Where diff_string is the string of the diff.

Optimizations

FastT5 reduced the file sizes from 900mb (torch file) to a total of only 200mb (3 ONNX models) and the memory footprint to only 90mb. Further, the execution speed dropped from 1.3s to 0.5s.

diff --git a/grunt/tasks/release.js b/grunt/tasks/release.js
index efbb2e7..d377ee4 100644
--- a/grunt/tasks/release.js
+++ b/grunt/tasks/release.js
@@ -7,7 +7,7 @@ var grunt = require('grunt');
 
 var BOWER_PATH = '../react-bower/';
 var BOWER_GLOB = [BOWER_PATH + '*'];
-var BOWER_FILES = ['React.js', 'React.min.js', 'JSXTransformer.js'];
+var BOWER_FILES = ['react.js', 'react.min.js', 'JSXTransformer.js'];
 var GH_PAGES_PATH = '../react-gh-pages/';
 var GH_PAGES_GLOB = [GH_PAGES_PATH + '*'];

Vision

I think in general there's a huge space for locally running LM-based devtools. There's generally huge privacy concerns in corporations around productivity tools for dev which is the problem around adoptivity of tools like Copilot/Codeium/Tabnine in companies, and I think powerful LM's can be built smaller while laptops are becoming beefier. There's also an assortment of optimizations (distillation + ONNX + quantization + llama.cpp) for running LLM's on lower-power machines.

Archive

ONNX Runtime improves performance from 1.3s to 670ms per iteration, but the resulting model is bigger (1.7gb of three files vs 800mb in original). Model can be found at https://huggingface.co/kevinlu1248/ct-base-commits-onnx/. Tests were conducted on the following example.

TODO

  • Figure out how to load in 16-bit or 8-bit properly
  • Turn into a Git tool
  • Cleanup codebase
  • Upload to pypa
  • Migrate to CodeTrans-small
  • Add a grammar checker
  • Use CodeT5+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

commit5-0.1.1.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

commit5-0.1.1-py2.py3-none-any.whl (3.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file commit5-0.1.1.tar.gz.

File metadata

  • Download URL: commit5-0.1.1.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for commit5-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2acc417cc4dc337cc35bf99d2beb008899d835328f6e4f8458b96087075bd97e
MD5 1b93f93369286723b22b93a604015ef0
BLAKE2b-256 1d9da64cbf671884b68ccd1ba6ad6d9ba29204bd7b4b85c6623c742905932c56

See more details on using hashes here.

File details

Details for the file commit5-0.1.1-py2.py3-none-any.whl.

File metadata

  • Download URL: commit5-0.1.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for commit5-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 2f12efa8ce0598b92597c7e0d76bdea1a7d5be38cd747544f2e3fedbf96ec6bf
MD5 b741e2af2657cada8e2d25c06f2635a1
BLAKE2b-256 a21e9434adbb99b80166960b5adbeaf3cf5281561e990c7a979f24297fcb41f4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page