Custom code assistant, get the data and train your code assistant
Project description
enigma_ai
enigma_ai is a python package for efficiently finetuning large language models (LLMs) like GPT-3 for code generation.
Features
- GitHub code scraper
- Scrape quality code from GitHub based on parameters like stars, size, topics etc. to create a clean dataset for finetuning
- Customizable to scrape code in specific languages, code styles etc.
- Optimization tools
- Find optimal hyperparameters like learning rate, batch size etc for your model and dataset to get best results
- Supports major LLMs like GPT-3, Codex and more
- Tunes hyperparameters based on compute constraints and desired loss function
- Easy finetuning
- Simple wrapper around HuggingFace and Lorå to finetune LLMs on your dataset
- Seamless integration, trains models out-of-the-box on your GPU/TPU
Installation
pip install enigma_ai
Usage
Scrape GitHub
from enigma_ai import GitHubScraper
scraper = GitHubScraper()
data = scraper.scrape(topics=['machine-learning'], stars=100, max_size=1000)
Optimize hyperparameters
from enigma_ai import HyperparamOptimizer
optimizer = HyperparamOptimizer(model='code-davinci-002')
params = optimizer.tune(data, max_epochs=10, target_loss=0.2)
print(params)
# Prints optimal params like LR, BS for target loss
Finetune model
from enigma_ai import Finetuner
finetuner = Finetuner(model='code-davinci-002')
finetuner.fit(data, epochs=10, batch_size=32, lr=2e-5)
finetuner.save('my_model.pkl')
And more!
Contributing
Contributions to enigma_ai are welcome...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
enigma_ai-0.1.1.tar.gz
(6.0 kB
view hashes)
Built Distribution
Close
Hashes for enigma_ai-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b16c9e296bb9445d279da15784d61dae81a2dd0a34846c7fb0affe9b6551a5d |
|
MD5 | 6cf66d5c696e3c5c614494d3a54d286e |
|
BLAKE2b-256 | 071a51a175595c67081ccd197efefbd88f9268baf9504df0bcbe72bd3adeabd9 |